随着大数据时代的到来,数据量和数据复杂度急剧提高,Skyline查询结果集规模巨大,无法为用户提供精确的信息。MapReduce作为并行计算框架,已广泛应用于大数据处理中。本文提出了MapReduce框架下基于支配个数的结果优化算法(MR-DMN),解决了大数据环境下的Skyline结果集优化问题。大量的实验表明:算法具有良好的时间和空间效率。
With the advent of big data, data volume and complexity increase drastically, Skyline query result set is so large that it can' t provide precise information to the users. As parallel computing framework, MapReduce has been widely applied to big data processing. A result optimal algorithm, MapReduce-based dominant number algorithm (MR-DMN)is proposed, based on dominating number under MapReduce framework, which solves problem of optimization of Skyline result set in big data environments. Lots of experiments show that the algorithm has good time and space efficiency.