Skyline查询已成为现今数据库和信息检索领域的研究热点之一,伴随着人类可以采集和利用的数据信息的急剧增长,使得如何处理海量数据的Skyline查询成为急需解决的问题.近年来兴起的Map-Reduce编程框架能够有效地处理基于海量数据的应用,该文既是研究如何运用Map-Reduce编程框架解决海量数据的Skyline查询问题.在Map-Reduce框架下处理Skyline查询的直接方法是扫描整个数据集进而得到查询结果,但是在海量数据Skyline查询问题中,查询结果的数量远小于原始数据集的数据量,对此该文提出了一系列的Skyline查询算法及优化,有效地过滤掉部分不能成为Skyline查询结果的数据对象,大幅度提高了在Map-Reduce框架下处理Skyline查询的效率.大量运行在Hadoop平台上的实验验证了该文所提出的Skyline查询处理算法具有良好的有效性、准确性和可用性.
Recently,Skyline query has been a research hot of Database and Information Retrieval.In addition,the amount of data for collecting and using by human is developing at an astonishing speed.Therefore,how to process Skyline query of massive data is an urgent problem.Map-Reduce is a new parallel programming model that processes vast number of data on large clusters with easy deployment.As a parallel programming model,Map-Reduce is suit for solving Skyline query of massive data.This paper resolves the problem of processing Skyline query of massive data on Map-Reduce framework.A straightforward implementation of Skyline query on Map-Reduce needs to scan all the candidate results before obtaining the final results.However,when the amount of final results is much smaller than the original data,there is a waste of processing unnecessary results on Map-Reduce framework.Consequently,in this paper,a series of efficient Skyline query algorithms and optimization have been proposed to prune the unpromising results effectively and enhance the performance of processing Skyline query of massive data on Map-Reduce.Our extensive experiments are built on top of Hadoop platform,an open-source implementation of Map-Reduce framework.The experiment results demonstrate that our algorithms have high efficiency,accuracy and scalability.