针对流数据规模参差不齐、流量动态变化且突发性较强的特点,提出一种可伸缩的动态MapReduce计算模型,支持大规模动/静态数据在线处理.基于Event推送方式,利用Netty底层异步通信方式等技术,建立在线MapReduce数据传输机制,进一步实现其原型程序,解决了大规模分布式计算程序的快速在线传输和数据分发等问题,支持流数据动态分发机制,为动态MapReduce模型提供支撑.与HadoopOnline系统的传统Socket管道传送方式相比,该方法能有效提高作业之间数据的传送效率,从而提高大规模流数据处理的实时性.
We proposed a scalable and dynamic MapReduce computation model which supports the online processing of large-scale dynamic/static data against the characteristics of uneven stream data size and dynamic flowing and breaking out suddenly.On this basis,we proposed an online MapReduce data transmission mechanism and implemented its prototype program based on the push mode of Event and the use of Netty asynchronous communication technology.This paper focuses on solving fast online transfer of the large-scale distributed computing program and data dynamic distribution to provide support for dynamic MapReduce model.The experimental results show that the method can greatly improve the transmission efficiency of data between jobs compared with the traditional socket pipeline method in Hadoop system and improve real-time data stream handling significantly.