在分布式集群系统中,数据根据划分算法存储在集群的各个节点,这为涉及大量连接操作的复杂查询带来了昂贵的网络开销。针对该问题,基于信息网模型INM(Information Network Mode),提出最小通信量查询划分算法和多目标查询优化算法。其中查询划分算法将复杂查询划分成多个PWOC(parallelizable without communication)子查询,所有子查询可近似无通信地并行执行。多目标优化算法将子查询作为查询计划的基本操作,并将并行性和通信代价同时作为驱动目标,以传统多目标加权算法结合贪心策略作为评估依据生成查询计划树。最后,系统基于TPC-H基准生成测试数据,将原始算法与优化算法进行了对比实验,结果表明优化算法可以极大提高复杂查询的效率。
In the distributed cluster system, data is partitioned in different nodes according to data partition algorithm, which causes expensive network communication expense for the complex multi-join query. To solve the problem, the Minimum Traffic Query Split Algorithm (MTQS) and the Multi-Objective Query Optimization Algorithm (MOQO) based on the Information Network Model are proposed. Among these two algorithms, MTQS is aimed at splitting query into several parallelizable without communication (PWOC) sub-queries, which guarantees every sub- query parallels approximately without communication. MOQO takes sub-query as the basic operation, which puts the parallelism and communication cost as goal driven and builds the query plan tree combining the traditional Muhi- Objective weighted algorithm with the greedy algorithm as the assessing accordance. In the end, the system generates test data by TPC-H benchmark and conducts a comparative experiment between the previous and optimal algorithm, the result proves that the optimal algorithm improves the efficiency of complex query significantly.