在实际Hadoop系统中,如何使作业完成时间最短成为了一个NP完全问题,导致这个问题的主要原因是MapReduce计算过程中大量的数据从Map节点向Reduce节点进行迁移,容易造成网络拥塞,使得数据迁移时间过长。软件定义网络(SDN)实现了路由控制和数据转发的分离,同时使交换机能够对网络中的数据进行灵活处理,使控制器能够知晓全局网络拓扑结构,其集中式管理模式为Hadoop进行性能优化带来了可能性。利用SDN对网络的灵活控制,让Map中间值在OpenFlow交换机上进行数据合并,减少数据流量和数据迁移时间,提高Hadoop工作效率。
In practical Hadoop systems, how to make the jo b com pletion tim e shorter is the NP com plete problem . The m ainreason for this problem is the long tim e in the process of massive data transm ission fro m the M ap node to the R educe node,which is caused by the netw ork congestion. Software defined n e tw orking com pletely strips out the con trol plane from sw itch andm igrate it in to the c o n tro lle r, enabling the switches to handle the data in netw ork and the con trollers to know global n e tw o rk,w h ich provides the p o s s ib ility o f the MapReduce netw ork op tim ization in the OpenFlow netw ork structure. The paper proposeda m ethod on m erging the M ap interm ediate data on the OpenFlow sw itch to reduce the tim e fo r data transm ission through thefle x ib le control on netw ork o f SDN.