MapReduce等分布式计算系统应用在数据中心内产生了严重的东西向流量,其中以incast和shuffle为代表的关联性流量占相当大的比重,进而严重影响到上层应用的性能.这促使研究者们考虑在这些关联性流量的网内传输阶段尽可能早而不是仅在流量的接收端进行流问数据聚合.首先以新型数据中心网络结构为背景讨论流间数据聚合的可行性和增益,为最大化该增益,为incast传输建立最小代价树模型.为解决该模型,提出了2种近似的incast树构造方法,其能够仅基于incast成员的位置和数据中心拓扑结构生成一个有效的incast树,进一步解决了incast树的动态和容错问题.最后,采用原型系统和大规模仿真的方法评估了incast流量的网内聚合方法,实验结果证明该方法能大幅降低incast流量造成的传输开销,能节约数据中心的网络资源.同时,提出的模型和解决方法也适用于其他类型的数据中心网络结构.
Data transfers, such as the common shuffle and incast communication patterns, contribute most of the network traffic in MapReduce like working paradigms and thus have severe impacts on application performance in modern data centers. This motivates us to bring opportunities for performing the inter-flow data aggregation during the transmission phase as early as possible rather than just at the receiver side. In this paper, we first examine the gain and feasibility of the inter-flow data aggregation with novel data center network structures. To achieve such a gain, we model the incast minimal tree problem. We propose two approximate incast tree construction methods, RS-based and ARS-based incast trees. We are thus able to generate an efficient incast tree solely based on the labels of incast members and the data center topology. We further present incremental methods to tackle the dynamic and fault-tolerant issues of the incast tree. Based on a prototype implementation and large-scale simulations, we demonstrate that our approach can significantly decrease the amount of network traffic, save the data center resources, and reduce the delay o for BCube and FBFLY can be adapted to other data centers structures of job processing. Our approach with minimal modifications.