东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

Hadoop平台中一种Reduce负载均衡贪心算法

ISSN号：1001-3695
期刊名称：《计算机应用研究》
时间：0
分类：TP301.6[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中南大学软件学院,长沙410075
相关基金：国家自然科学基金资助项目(61103202);国家教育部博士点基金资助项目(20110162120046);中南大学教师研究基金资助项目(2014JSJJ019)

关键词： MAPREDUCE, 贪心算法, Reduce负载均衡, 抽样, MapReduce, greedy algorithm, load balancing of Reduce, sampling

中文摘要：

MapReduce是目前广泛应用的并行计算框架，是Hadoop平台的重要组成部分。主要包括Map和Reduce函数，Map函数输出key-value键值对作为Reduce的输入。由于输入的动态性，不同主机上的Reduce处理的输入量存在不均衡性。如何解决Reduce的负载均衡是优化MapReduce的一个重要研究方向。对整体数据进行抽样，通过适量的样本分析数据，达到较小的代价获得可靠的key分布，提出贪心算法代替Hadoop平台默认的Hash算法来划分数据，实现Reduce负载均衡。提出的贪心算法主要思想是根据抽样数据，求取所有key频次的和对于Reduce节点数量的平均值，然后依次为每一个Reduce分配一个接近平均值的负载，从而达到整体的负载均衡。模拟实验表明，所提算法与默认的hash分区算法相比，运行时间节约10.6%，达到更好的负载均衡。

英文摘要：

MapReduce is used wildly as a parallel computing framework, mainly including the Map function and Reduce function.Map function has the output of the key-value pairs, which are the input of the Reduce function. As a result, the input of Reduce is dynamic. The load balancing of Reduce hosts has an important impact on the efficiency of MapReduce. This paper sampled the overall data. The aim was to obtain reliable key distribution at a cheap price. Then, it proposed a greedy algorithm to divide data to achieve Reduce load balancing, taking the place of hash algorithm. The main idea of the proposed greedy algorithm was to assign the right job to a Reduce host for the best load balancing in each step. Simulation results show that proposed algorithm has better performance than the other two algorithms. Compared with the default Hash partitioning algorithm,the proposed algorithm decreases the running CPU time by 10. 6% , and achieves better load balancing.

同期刊论文项目