东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种Spark环境下的高效率大规模图数据处理机制

ISSN号：1001-3695
期刊名称：《计算机应用研究》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：云南大学信息学院,昆明650091
相关基金：国家自然科学基金资助项目（61170222）

关键词：图计算, 内存计算, 图数据库, HADOOP, SPARK, PAGERANK, graph computing, memory computing , graph database , Hadoop , Spark , PageRank

中文摘要：

针对现有的图处理和图管理框架存在的效率低下以及数据存储结构等问题，提出了一种适合大规模图数据的处理机制。首先分析了目前的一些图处理模型以及图存储框架的优势与存在的不足。其次，通过对分布式计算的特性分析采取适合大规模图的分割算法、数据抽取的优化以及缓存、计算层与持久层结合机制三方面来设计图数据处理框架。最后通过PageRank和SSSP算法设计实验，与MapReduce框架和采用HDFS作持久层的Spark框架进行性能对比。实验证明提出的框架要比MapReduce框架快90倍，比采用HDFS作持久层的Spark框架快2倍，能够满足高效率图数据处理的应用前景。

英文摘要：

Due to the inefficiency problems in processing, storage and management framework of graph data, this paper proposed a feasible processing mechanism of large-scale graph data. It first reviewed the advantages and shortages of existing graph processing models and graph data storage frameworks. By analyzing the characteristics of distributed computing, it implemented a new graph data framework including three main parts： segmentation algorithm of large-scale graph, caching and optimization for data extraction, and combination mechanism of calculation and persistence layer. By applying PageRank and SSSP algorithm, it conducted experiments to compare the performance of the proposed framework, MapReduce and Spark with HDFS. Results show that the proposed framework is more 90 times faster than MapReduce, and 2 times faster than Spark with HDFS, and the proposed framework can satisfy the needs of high performance graph data processing.

同期刊论文项目