在多准则决策支持等多个应用中,top-k dominating查询是一种十分实用的查询,它在潜在的巨大的数据空间中返回k个支配分数最大的元组.现有算法,要么需要为特定的属性组合构建索引,要么需要较大的I/O费用或内存费用,从而无法有效处理海量数据上top-k dominating查询.文中提出一种新的查询算法TDEP,该算法利用以较小代价为每个属性构建的有序列表来有效返回海量数据上的top-k dominating查询结果.文中将TDEP算法的执行明确地分为两个阶段:增长阶段和收缩阶段.在每个阶段,TDEP算法以round-robin方式读取涉及到的有序列表并维护候选元组,直到满足结束条件.文中分析了两个阶段的执行行为,提出一种新的不需要重新读取有序列表的支配分数计算方法.同时,文中还提出有效的早剪切操作,可以有效减少TDEP算法需要维护的候选元组数量.实验结果表明:和现有算法相比,TDEP算法具有较大的性能优势.
In many applications like multi-criteria decision making, top-k dominating is a practically useful tool to return k tuples with the highest domination scores in a potentially huge data space. The existing algorithms, either requiring indexes built on the specific attributes, or incurring high I/O cost or memory cost, cannot process top-k dominating query on massive data efficiently. In this paper, a novel algorithm TDEP is proposed to utilize sorted lists built for each attribute with low-cost to return top-k dominating results on massive data efficiently. Through analysis, it is found that TDEP is executed in two phases: growing phase and shrinking phase. In each phase, TDEP retrieves the sorted lists in a round-robin fashion and maintains the candidates until the termination condition is satisfied. The theoretical analysis is provided for the execution behavior in two phases. An efficient method is developed to compute the domination scores for tuples without re-retrieving the sorted lists. Besides, TDEP adopts early pruning to reduce the number of candidate tuples maintained. The extensive evaluation results show the significant performance advantage of TDEP over the existing algorithms.