由于在多标准决策支持等应用上具有重要的意义,skyline查询成为近年来数据库和数据挖掘领域的一个研究热点.然而随着数据集维数的增加,数据点之间形成支配关系的可能性越来越小,导致了skyline点数目过多而无法提供任何有效信息.为了在高维数据集中找到更重要和更有意义的skyline点,人们提出了k-支配skyline的定义.但现有的用于k-支配skyline的算法在时间效率、空间复杂度和渐进输出性上都有待提高.该文提出了一种基于索引的高效k-支配skyline算法,通过为数据集建立两个索引,算法可以高效地进行计算,在时间、空间和渐进性上均优于现有的算法.
Due to the importance for several applications involving multi-criteria decision making, skyline query has received a lot of attention in the research field of database and data mining in recent years. However, as the number of dimensions increase, the possibility to form dominant relationship between data points is very low. As a result, the number of skyline points becomes too numerous to provide any useful information. For the sake of finding more important and more meaningful skyline points in high dimension data set, a new concept called k-dominant skyline was proposed. Currently the algorithms used for k-dominant skyline query do not have good performance on the time and space complexity. None of them is a good online algorithm. This paper proposes a new index based algorithm, by building two index of the data set, the algorithm can efficiently compute k-dominant skyline. It is an online algorithm and has better performance than all of current algorithms.