针对Hadoop平台数据被任务调度感知,进行本地化处理的新特征,探索Haoop平台中Map任务数据访问监控机制。提出Hadoop平台数据访问监控不仅应服务于数据存取效率的提升,还应服务于Map/Reduce并行作业执行效率提升的基本思想,并增加对并行执行多Map任务数据访问开销均衡性的监控。基于该思想,定义Hadoop平台数据访问监控的粒度和监控信息组成;依托Hadoop平台现有结构,设计了基于master-slave的监控体系结构,并给出了监控主要功能模块的具体实现技术及测试结果。
Aiming on the issue of task scheduler considering the data location information for locality-based data processing in Hadoop Map tasks, a novel data access behavior monitoring mechanism is proposed in this paper. It is argued that the data access monitoring mechanism of Hadoop platform should not only serve to promote the efficiency of data access, but also serve to promote the execution efficiency of parallel Map/Reduce jobs. It is necessary to monitor the balance of data access overhead in the parallel execution of multiple Map tasks. The granularity and information set of data access monitoring in Hadoop platform is defined;The master-slave-based monitoring architecture is presented, which works with the support of Hadoop existing function modules; The detail implementation of the main monitoring function modules is discussed and the experimental results is analyzed.