东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

Bipartite-Oriented Distributed Graph Partitioning for Big Learning

ISSN号：1000-9000
期刊名称：《计算机科学技术学报：英文版》
时间：0
分类：TP391.41[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术] TP18[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]Shanghai Key Laboratory of Scalable Computing and Systems, Institute of Parallel and Distributed Systems Shanghai Jiao Tong University, Shanghai 200240, China
相关基金：This work was supported in part by the Doctoral Fund of Ministry of Education of China under Grant No. 20130073120040, the Program for New Century Excellent Talents in University of Ministry of Education of China, the Shanghai Science and Technology Developnmnt hinds under Grant No. 12QA1401700, a foundation for the Author of National Excellent Doctoral Dissertation of China, the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing under Grant No. 2014A05, the National Natural Science Foundation of China under Grant Nos. 61003002, 61402284, the Shanghai Science and Technology Development Fund for High-Tech Achievement Translation under Grant No. 14511100902, and the Singapore National Research Foundation under Grant No. CREATE E2S2.

关键词：分布式图形, 机器学习, 分区, 分割算法, 网络通信, 计算负荷, 不平衡数据, 数据挖掘, bipartite graph, graph partitioning, graph-parallel system

中文摘要：

象建议，当模特儿的话题，和医药诊断一样的许多机器学习和数据采矿(MLDM ) 问题能在由两部组成的图上作为计算被建模。然而，很分布式的图平行系统对在这的唯一的特征忘却图和存在的联机图划分算法通常在网络通讯上象重要压力一样引起顶点的过多的复制。这篇文章识别为分布式的 MLDM 处理划分由两部组成的图的挑战和机会并且建议 BiGraph，划分算法的一套由两部组成面向的图。BiGraph 力量观察象数据在导出一套最佳的图的顶点的二个子集之间缩放划分导致最小的顶点复制和网络通讯的算法的顶点，区别计算负担和 imbalanced 的扭曲的分发那样。BiGraph 在 PowerGraph 上被实现了并且被显示有表演增加直到 17.75X (从 1.16X ) 为四个典型 MLDM 算法，由于减少直到 80% 顶点复制，并且直到 96% 网络交通。

英文摘要：

Many machine learning and data mining （MLDM] problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs. However, inost distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph partitioning algorithms usually cause excessive repli- cation of vertices as well as significant pressure on network communication. This article identifies the challenges and oppor- tunities of partitioning bipartite graphs for distributed MLDM processing and proposes BiGraph, a set of bipartite-oriented graph partitioning algorithms. BiGraph leverages observations such as the skewed distribution of vertices, discriminated computation load and imbalanced data sizes between the two subsets of vertices to derive a set of optimal graph partition- ing algorithms that result in minimal vertex replication and network communication. BiGraph has been implemented on PowerGraph and is shown to have a performance boost up to 17.75X （from 1.16X） for four typical MLDM algorithnls, due to reducing up to 80% vertex replication, and up to 96% network traffic.

同期刊论文项目

基于动态差异化策略的高效大规模图计算系统研究

期刊论文 2

面向多核平台的日用操作系统性能可伸缩性研究

期刊论文 3 会议论文 9 专利 2

同项目期刊论文

Mercury: Combining Performance with Dependability Using Self-Virtualization

Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling

Java虚拟机在Intel众核架构下的动态数据预取研究与优化

期刊信息

《计算机科学技术学报：英文版》
中国科技核心期刊

主管单位:
主办单位:中国科学院计算机技术研究所
主编：
地址：北京2704信箱
邮编：100080
邮箱：jcst@ict.ac.cn
电话：010-62610746 64017032

国际标准刊号：ISSN：1000-9000
国内统一刊号：ISSN：11-2296/TP
邮发代号:2-578

获奖情况:

国内外数据库收录:

被引量:505