当属性域是偏序的时候,最终的Skyline点几乎和原始数据集一样大小.因为大多数情况下,数据集里至少有一维点与点之间是不可比的.因此在保留感兴趣的点的同时,将大数据集裁剪到一个合理的规模,是一个值得研究的问题.为了得到一个更小更有用的Skyline点集,可以更好地反映真实的用户偏好,本文基于两种假设:偏好的参数是不完整的,实际的偏好是传递性的,提出一个更为广义的控制关系概念.
The skyline of a set P of multi-dimensional points ( tuples ) consists of those points in P for which no clearly better point in P exists, using component-wise comparison on domains of interest. The guiding idea is to prune large data sets to a more manageable size, while ensuring that points of interest are preserved. However, when domains are only partially ordered,it easily happens that the skyline is nearly as large as the original set ( or at least of the same order of magnitude ), since most of the time points are incomparable in at least some dimension. To obtain a smaller, more useful skyline set which better reflects actual user preferences, we propose a richer notion of dominance,based on two assumptions:that preference specifications are often incomplete, and that actual preferences are transitive. Experiments on both real and synthetic data sets show that our new skyline notion scales well and is highly accurate in terms of user expectations.