巨大的 XML 数据逐渐地为网信息的表示,存储和交换被产生。枝条质问在巨大的 XML 数据上处理成为了一个研究焦点。然而,很传统的算法不能直接以一种分布式的方式被实现。存在的一些散布了算法产生很多无用的中间的结果并且执行许多在大多数情况中加入部分结果的操作;其它在处理的 XML 分区,存储和质问前要求询问模式的 priori 知识,它在大规模数据或经常的到来的新询问的情况中是不切实际的。为了改进效率和可伸缩性,在这篇论文,我们建议 3 阶段基于节点分发机制散布了算法 DisT3 避免不必要的中间的结果。而且,我们与用任意的划分策略划分途径的提高的 XML 建议一个小本地索引议员,并且基于议员,我们建议改进 2 阶段散布了算法 DisT2ReP 进一步减少通讯费用。在性能保证被分析以后,广泛的实验被进行在分布式的枝条询问应用程序验证我们的建议算法的效率和可伸缩性。
Massive XML data are increasingly generated for the representation, storage and exchange of web information. Twig query processing over massive XML data has become a research focus. However, most traditional algorithms cannot be directly implemented in a distributed manner. Some of the existing distributed algorithms generate a lot of useless intermediate results and execute many join operations of partial results in most cases; others require the priori knowledge of query pattern before XML partition, storage and query processing, which is impractical in the cases of large-scale data or frequent incoming new queries. To improve efficiency and scalability, in this paper, we propose a 3-phase distributed algorithm DisT3 based on node distribution mechanism to avoid unnecessary intermediate results. Furthermore, we propose a lightweight local index ReP with an enhanced XML partitioning approach using arbitrary partitioning strategy, and based on ReP we propose an improved 2-phase distributed algorithm DisT2ReP to further reduce the communication cost. After the performance guarantees are analyzed, extensive experiments are conducted to verify the efficiency and scalability of our proposed algorithms in distributed twig query applications.