针对现有的图处理和图管理框架存在的效率低下以及数据存储结构等问题,提出了一种适合大规模图数据的处理机制。首先分析了目前的一些图处理模型以及图存储框架的优势与存在的不足。其次,通过对分布式计算的特性分析采取适合大规模图的分割算法、数据抽取的优化以及缓存、计算层与持久层结合机制三方面来设计图数据处理框架。最后通过PageRank和SSSP算法设计实验,与MapReduce框架和采用HDFS作持久层的Spark框架进行性能对比。实验证明提出的框架要比MapReduce框架快90倍,比采用HDFS作持久层的Spark框架快2倍,能够满足高效率图数据处理的应用前景。
Due to the inefficiency problems in processing, storage and management framework of graph data, this paper proposed a feasible processing mechanism of large-scale graph data. It first reviewed the advantages and shortages of existing graph processing models and graph data storage frameworks. By analyzing the characteristics of distributed computing, it implemented a new graph data framework including three main parts: segmentation algorithm of large-scale graph, caching and optimization for data extraction, and combination mechanism of calculation and persistence layer. By applying PageRank and SSSP algorithm, it conducted experiments to compare the performance of the proposed framework, MapReduce and Spark with HDFS. Results show that the proposed framework is more 90 times faster than MapReduce, and 2 times faster than Spark with HDFS, and the proposed framework can satisfy the needs of high performance graph data processing.