象建议,当模特儿的话题,和医药诊断一样的许多机器学习和数据采矿(MLDM ) 问题能在由两部组成的图上作为计算被建模。然而,很分布式的图平行系统对在这的唯一的特征忘却图和存在的联机图划分算法通常在网络通讯上象重要压力一样引起顶点的过多的复制。这篇文章识别为分布式的 MLDM 处理划分由两部组成的图的挑战和机会并且建议 BiGraph,划分算法的一套由两部组成面向的图。BiGraph 力量观察象数据在导出一套最佳的图的顶点的二个子集之间缩放划分导致最小的顶点复制和网络通讯的算法的顶点,区别计算负担和 imbalanced 的扭曲的分发那样。BiGraph 在 PowerGraph 上被实现了并且被显示有表演增加直到 17.75X (从 1.16X ) 为四个典型 MLDM 算法,由于减少直到 80% 顶点复制,并且直到 96% 网络交通。
Many machine learning and data mining (MLDM] problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs. However, inost distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph partitioning algorithms usually cause excessive repli- cation of vertices as well as significant pressure on network communication. This article identifies the challenges and oppor- tunities of partitioning bipartite graphs for distributed MLDM processing and proposes BiGraph, a set of bipartite-oriented graph partitioning algorithms. BiGraph leverages observations such as the skewed distribution of vertices, discriminated computation load and imbalanced data sizes between the two subsets of vertices to derive a set of optimal graph partition- ing algorithms that result in minimal vertex replication and network communication. BiGraph has been implemented on PowerGraph and is shown to have a performance boost up to 17.75X (from 1.16X) for four typical MLDM algorithnls, due to reducing up to 80% vertex replication, and up to 96% network traffic.