最大化利用本地磁盘的I/O资源是提升计算集群性能的关键,但Hadoop系统中多数调度算法未考虑此项因素。为此,引入磁盘负载作为Map任务选择的权衡参数,任务调度时参照磁盘负载程度选择合适的任务,以保证数据节点上各磁盘的负载相对均衡,并据此设计新的任务选择模块集成到Hadoop的调度器中。同时为进一步提升Hadoop系统的性能,实现Map作业的近似完全本地化执行。实验结果表明,该任务选择策略能够充分利用数据节点本地磁盘的I/O资源,可使节点的I/O Wait平均降低5%,CPU利用率平均上升15%,作业的执行时间缩短20%。
Maximum use of local disk I/O resources is the key to improve computing cluster performance,but most of the scheduling algorithms in Hadoop system do not consider this factor.Aiming at this problem,a new task selection strategy is proposed,which takes the disk workload as a parameter in the procedure of MAP task selection and refers to each disk workload to choose the appropriate task during task scheduling,so as to achieve balanced disk workload on data nodes.Besides,a new task selection module is designed and integrated into the task scheduler of Hadoop.In order to further improve Hadoop system’s performance,an appropriate fully localized job execution mechanism is implemented. Experimental results prove that the proposed strategy makes full use of disk I/O resources,reduces I/O Wait by 5% on average,increases CPU utilization rate by 15% on average,and reduces the job execution time by 20%.