在Hadoop MapReduce环境中,如果能预知作业的执行时间,就可在资源分配、任务调度以及负载均衡过程中作出更合理的决策,改善系统性能.在分析Hadoop MapReduce作业执行模式后,提出了一种作业执行时间在线预测方法.该方法在结合历史信息的基础上,可根据作业在不同阶段的执行进度在线预测执行时间.该方法已在Hadoop-0.20.2中实现,并在一个包含19个节点的Linux集群中进行了验证.实验结果表明,在最好情况下,根据该方法预测的执行时间和真实执行时间的误差约2%.
In Hadoop MapReduce environments,if the execution time of jobs is forecast,can make more appropriate decisions when allocating resource,scheduling tasks or balancing load.This paper proposed an online method to predict the execution time of jobs after analyzing the execution mode of Hadoop MapReduce jobs.The method can predict the execution time according to the progresses of different phases,combining with historical information.It has been implemented in Hadoop-0.20.2,and evaluated in a Linux cluster with 19 nodes.The experiment results show that the difference between the real and predicted results is around 2% in a best case.