数据中心的能耗管理已经成为大规模数据处理中的热点问题,其主要目标是控制相关成本的急剧增长.大量的工作围绕着在集群利用率较低时,关闭部分服务器来降低能耗,但是这些方法都严重受限于数据存储策略,同时难以保证应用的实时性能.MapReduce集群作为目前流行的大规模数据处理平台,能耗问题尤为突出.文中针对异构Malc,Reduce集群,提出一种面向SLA的能耗管理方法Seadown.首先,提出一种混合数据副本存储策略,它允许关闭大量节点,同时保证数据的完整性和集群的容错能力.其次,设计了一种基于历史记录的响应时间预测方法,它根据服务器节点的数量、性能参数和运行时间的历史信息准确估计程序的响应时间,相对误差大都在6%以下.最后,通过选择性地关闭部分节点以达到最小化能耗,同时保证应用程序的实时性能.文中证明了该优化问题是NP—hard问题,并提出了一种启发式的节点关闭策略.实验结果表明,在节点关闭策略下MapReduce应用的实时性得以保证,同时降低了大量的能耗.
Power consumption accounts for a large proportion of operating cost in data centers which adds substantially to an organization's power bills and carbon footprint, while much of the energy is wasted. One class of works are seeks to turn off servers for power saving during low utilization period, but most of them are highly constrained by data layout and performance penal- ty. Arbitrarily powering down servers that are running data-intensive applications is problematic, since it rends data loss, decreases the ability of fault tolerance and affects processing speed. This paper proposes a SLA-aware size-scaling framework named Seadown in heterogeneous MapRe- duce cluster. The authors first design a hybrid data layout policy which allows turning off large amount of nodes without data lose, and brings high rebuild parallelism in case of failure, then propose a pre-knowledge based workload runtime estimation method which accurately predicts the performance in various cluster configurations. By holding this detailed information, selectively turn nodes off with the purpose of minimizing the energy consumption as well as meeting the performance requirement. The authors prove the NP-hardness of the targeted problem and propose a fine-grained heuristic algorithm to power down servers. Through comprehensive experi- ments, it is found that the relative errors of the runtime estimation are mostly below 6 ~ and the Seadown framework can effectively cut large portion of energy consumption while meeting per- formance requirement.