随着云计算环境中数据量的激增,人们急需研究在云环境下如何对大量数据进行快速有效的分析与处理。在云环境下对大量数据进行高效地排序是其中一个重要问题。基于 Hadoop平台研究并实现了几种高效的排序算法,包括:Radix sort ,Quicksort和Sample sort算法。对各个排序算法的执行效率、CPU资源的消耗,内存的消耗,以及处理机间的通信量进行了研究和比较分析。通过大量运行在 Hadoop上的实验,发现 Hadoop平台上的Sample sort相较于Radix sort和Quicksort具有排序速度快,负载均衡度高,CPU消耗低等优势。这一结果为云计算环境下设计更高效、节能的算法提供了有效的依据和基础。
With the rapid increase of data amount in cloud computing environment ,it is an urgent need to study how to analysis and process those data fast and effectively .How to sort large scale data efficiently in cloud computing environment is a significant problem .Whether the widely used sorting algorithms can achieve high-performance and how many cloud computing resources they consume are concerned problems . This paper studies and implements several efficient sorting algorithms ,including Radix sort ,Quicksort and Sample sort ,based on Hadoop ,analyzes and compares the efficiency ,consumption of CPU resources , memory consumption and communication between machines .Through a large number of experiments ,it’s found that compared to Radix sort and Quicksort ,Sample sort has the advantages of higher sorting speed , higher load balancing and lower CPU consumption .This result provides a valid basis and foundation for designing more efficient ,energy-saving algorithms in cloud computing environment .