对高性能计算集群在运行过程中如何通过关闭闲置节点来实现有效节能的问题进行了研究和探讨,设计和实现了基于任务负载量统计监测的节点启停机制。根据对系统中作业运行和排队情况的记录和分析,通过参数估计设计了反映队列任务情况的负载因子,并围绕负载因子制定具体策略,结合作业系统的队列设置和资源分配规则,对集群中的空闲节点进行自动启停控制。模拟实验表明,基于任务负载监测的节点启停机制能够有效地自动启停系统中闲置的节点,从而降低系统功耗,并且对系统中作业的整体完成时间基本不造成影响。
This paper discussed the method of closing idle nodes to save power in high-performance computing clusters. It proposed a mechanism for nodes start and stop control based on task load monitoring and statistics and designed task load indicator using parameter estimation. It set up detail strategies around this indicator to automatically control the idle nodes starting and stopping. It also considered queue configuration and resource allocation of job manage system in the strategies. Simulation tests indicate that the nodes start/stop mechanism can effectively control the idle nodes in the system, so that the power consumption chn be reduced; moreover, the mechanism impacts very little on the system overall job scheduling and running.