Hadoop平台下,数据的负载均衡对平台性能的发挥有着深远的影响。首先分析默认数据负载均衡的局限性,针对现有默认HDFS(Hadoop Distributed File System)数据负载均衡算法只考虑存储空间利用率,而未考虑节点间异构性的问题,提出一种量化异构集群数据负载均衡的数学模型。该模型根据节点的存储空间及节点性能计算得到各个节点的理论空间利用率,并根据当前集群存储空间利用率动态调整节点最大负载。实验结果表明,提出的数据负载均衡策略能够让异构集群达到更合理的均衡状态,提高集群的效率,并有效减少作业的执行时间。
In Hadoop,the data load balancing has profound effect on the exertion of platform performance. First we analysed the limitation of default data load balancing,aiming at the problem of current default HDFS( Hadoop distributed file system) that the data load balancing algorithm only focuses on the storage space utilisation but not considers the heterogeneity between nodes,we presented a mathematic model which quantifies the data load balancing of heterogeneous clusters. The model calculates the theoretical space utilisation of each node based on their allocated storage space and processing capacity,and dynamically adjusts the maximum load of each node according to current average utilisation of cluster storage space. Experimental result showed that the proposed data balancing strategy could enable the heterogeneous clusters to reach more reasonable balancing state so as to improve clusters efficiency,and to decrease the execution time of job effectively as well.