Hadoop是一种处理和存储大数据的平台,针对异构Hadoop集群下采用均等数据分配方法易降低系统计算性能的问题,提出一种自适应平衡数据存储的大数据放置策略。根据异构集群中各节点的计算能力比例,将数据分配到节点中。在任务处理过程中,根据反馈的任务完成时间信息,动态更新节点的能力比例,自适应调整数据分配,从而使异构Hadoop集群中各节点处理数据的时间大致相同,降低节点之间的数据移动量,提高了节点利用率。实验结果表明,该策略能够有效缩减任务完成时间,提高了系统的整体性能。
Hadoop is a platform for processing and storage of big data. A big data placement strategy for adaptive balance data storage in heterogeneous Hadoop cluster is proposed to solve the issue that equal data placement in heterogeneous Hadoop cluster will debase the calculation performance of the system. The data is placed into nodes according to the computing capacity ratio of each node in heterogeneous cluster. In the process of task processing,the node ratio is updated dynamically according to the feedback information of task completion time,and then the adaptive adjustment of data distribution is conducted,so that the time that every node in heterogeneous Hadoop cluster processes the data is basically same,the amount of data transfer between nodes is reduced,and the utilization of nodes is improved. The experimental results show that the proposed strategy can effectively reduce the task completion time and improve the overall performance of the system.