针对在Hadoop分布式文件系统中,系统默认的数据负载均衡算法在做负载均衡决策时只根据磁盘空间使用率这单一的衡量指标无法准确反映集群中各服务器实际的工作负载状况这一缺陷,通过研究和分析Hadoop分布式文件系统本身的负载均衡机制和在其基础上改进的负载均衡算法,本文提出了一种基于多衡量指标的负载均衡方法,定义了一个衡量指标函数,然后根据该函数计算集群中各数据服务器的负载量来对集群进行负载决策.最后通过实验结果表明,该方法有效地提高了Hadoop分布式文件系统中的负载均衡效果.
When Hadoop Distributed File System (HDFS) make the load balancing decisions, the default load balancing algorithm of HDFS only depends on the measure which is the storage space usage rate can't reflect the real status of Datanode accurately. According to the defect, with the research and anal- ysis on the HDFS load balancing mechanism and the improved load balancing algorithm which is based on it, this paper presents a load balancing method based on multiple metrics, and it designs a measure function based on the method, then the system make the load decisions depends on the value which is calculated out by the function proposed in this paper. The experiment results show that the method can enhance the effect of HDFS load balancing.