与传统数据中心节能算法不同,MapReduce计算任务的数据依赖性使得设计HDFS(HadoopDistributedFileSystem)节能算法时必须保证集群中所有数据块的可用性,即任意数据块或其副本中的至少一块处于活动状态.根据HDFS集群结构与数据块存储等特点建立了DataNode节点矩阵、节点状态矩阵、文件分块矩阵、数据块存储矩阵与数据块状态矩阵,为后续研究建立了基础模型.结合数据块状态矩阵与数据块可用性之间的关系设计了DataNode节点休眠验证算法.概率分析了由于机架感知的存储策略带来数据块分布的随机性,使得在不改变数据块存储结构与存储策略的情况下并不能通过休眠DataNode节点达到节能的目的.进而设计了数据块存储结构配置节能算法与基于对称数据块存储策略下的节能算法,分别从改变数据块的存储结构与存储策略两方面对HDFS进行节能改进.实验结果表明:两种节能算法都能解决HDFS集群的能耗低利用率问题,并且集群负载越低节能效率越高.
Different from traditional energy-efficiency algorithms in data center, data-dependent computing mechanism of MapReduee makes energy-efficiency algorithm in HDFS (Hadoop Dis- tributed File System) must ensure the availability of all data blocks in cluster, that means at least one data block or its replica should in active state. DataNode matrix, DataNode status matrix, file block matrix, block storage matrix and block status matrix are created based on the HDFS cluster structure and block storage mechanism etc. , and those matrixes established foundational models for further research. Based on the relationship between the availability of data blocks and its block status matrix, algorithm for make sure if a DataNode can sleep is designed. Probability analysis makes out that it is difficult to save energy in HDFS cluster without changing the data block's storage structure or replica placement mechanism because randomness distribution of the data block result from rack-awareness replica placement mechanism. So we design data block storage structure configuration energy-efficiency algorithm and energy-efficiency algorithm under symmetric replica placement mechanism to save the energy consumption of the HDFS cluster from changing and improving of block's storage structure and replica placement mechanism respectively. Mathematical analysis and experiments prove that two energy-efficiency algorithm solve HDFS cluster's high energy consumption but low-efficiency problem, the lower utilization of the cluster the more energy consumption it can save.