随着互联网的快速发展和大数据的来临,基于数据密集型应用的集群计算框架不断涌现,并且这些计算框架都只面向某一类特定领域的应用.基于这一特点,互联网公司往往需要部署和运行多个计算框架,从而为每个应用选择最优的计算框架.因此,资源统一管理和调度系统作为集群共享平台被提出来.集群资源统一管理和调度系统需要同时支持多种不同计算框架,如何管理集群计算资源和不同计算框架间的资源公平分配成为关键技术难点.不同计算框架的作业是异构的,如何在不同框架间进行作业调度,以充分利用集群资源和提高系统吞吐量,成为了新的挑战.本文针对现有的资源管理系统和应用需求特点,研究和分析了集群资源管理和调度的关键技术,并对现有的集群资源管理技术存在的问题和未来发展进行了探讨.
With the rapid development of Internet and the coming of big data, resource manage- ment system, a thin resource sharing layer that enables sharing cluster across diverse cluster computing frameworks, by giving frameworks a sources, For powering both large Internet services common interface for accessing cluster reand a growing number of data-intensive scientific applications, cluster computing framework will continue emerge, and no framework will be optimal for all applications. Therefore multiplexing a cluster between frameworks makes significant difference. Deploying and running multiple frameworks in the same cluster, improves utilization and allowing applications to share access to large datasets that may be costly to replicate across clusters. This paper is aimed to illustrate current maior techniques of resource management and scheduling in cluster, including resource representation model, resource allocation model and scheduling policy. Finally, current prominent solutions, which have been developed and used by many companies, will be demonstrated, and we then summary and contrast these solutions used in recent years.