摘要:集群中的节点由于其硬件配置不同导致其性能不同,使得集群环境成为异构环境。然而,在以Hadoop为代表的MapReduce实现中并没有充分考虑到环境的异构性,致使过多的Map任务需要从其他节点传输输入数据块,从而影响MapReduce在异构环境中的性能。提出了一种基于节点性能的数据块副本分布策略,使副本的分布与节点性能相适应,同时将可靠性、创建的传输开销及数据块间性能的平衡纳入考虑。结果表明:该策略在异构环境中能有效提升输入数据在本地的Map任务比例,并缩短MapReduce任务的完成时间。
In heterogeneous environments, the nodes in a cluster have different performances due to their various hardware configurations. It is known that the Hadoop, the most widely used MapReduce implementation, does not sufficiently take heterogeneous environments into consideration. Moreover, in heterogeneous environments, many map tasks are not data-local such that the severe performance degradation. A novel replica placement strategy is proposed, which is based on the performance of nodes. The replica placement strategy also takes reliability, the overhead of replicas creation, and the performance balance between data blocks into account. Results show that the proportion of data-local map tasks is increased and the response time of MapReduce jobs is decreased effectively by using the proposed replica placement.