虚拟化技术作为一种新的资源管理技术,正在高能物理领域得到越来越广泛的应用。静态虚拟机集群方式已经逐渐不能满足多作业队列对于计算资源动态的需求。为此,实现了一种云计算环境下面向多作业队列的弹性计算资源管理系统。系统通过高吞吐量计算系统HTCondor运行计算作业,使用开源的云计算平台Openstack管理虚拟计算节点,给出了一种结合虚拟资源配额服务,基于双阈值的弹性资源管理算法,实现资源池整体伸缩,同时设计了二级缓冲池以提高伸缩效率。目前系统已部署在高能所公共服务云IHEPCloud上,实际运行结果表明,当计算资源需求变化时系统能够动态调整各队列虚拟计算节点数量,同时计算资源的CPU利用率相比传统的资源管理方式有显著的提高。
As a new resource management technology,virtualization technology is more and more widely used in the field of high-energy physics.Static virtual machine cluster mode has been unable to meet dynamic demand for computing resource of multi-job queues.To solve this problem,an elastic computing resource management system under cloud computing environment has been designed and implemented.The high throughput computing system-HTCondor is used to run high-energy physics jobs and the cloud computing platform-Openstack is used to manage virtual computing nodes.An elastic resource management algorithm based on dual thresholds is proposed,combined with resource quota service.A twostage pool is designed to improve the efficiency of resource pool expansion.At present,the system has been deployed in IHEPCloud.The practical run results show that with the changes of resource demand,the system adjusts the number of virtual computing nodes dynamically.CPU utilization of the cluster is significantly increased as well.