针对默认的Hadoop数据副本策略未考虑集群节点硬件配置的异构、文件访问特点、实时负载等信息,导致异构环境中集群计算任务本地化比例下降、影响计算性能,提出计算型数据的副本放置优化策略.量化数据访问特征以及节点实时性能和负载,以节点数据访问负载与其计算性能相匹配为原则为副本选择存储节点.实验结果表明:与默认策略相比,优化的副本放置策略能更有效地为副本选择合适的存储节点,提高计算任务本地化比例和计算性能,并使集群对节点的变动具有更好的适应性.
Without considering hardware heterogeneity in cluster nodes,characteristics of data access,real workloads,the default data placement strategy applied in Hadoop distributed file system will hinder the use of data locality in Map task,leads to degradation of cluster computing performance.An optimized replica placement strategy for computational data was presented.Taking into account data access features,as well as real-time performance and workloads,to the principle of matching data access load and computing performance for each node,optimized replica placement strategy choosed appropriate storing nodes for data replicas.The results show that compared to default strategy,the proposed replica placement strategy could improve the computing performance of heterogeneous cluster,due to enhancing the advantages of data locality of Map task.Furthermore,the cluster applied optimized replica placement strategy has better stability and resilience to the change of nodes.