距离查询是图数据挖掘应用中的最基本的操作之一,但是目前的现存查询算法均无法高效处理大规模图数据.针对这个问题,提出建立多级社区中心的标签机制,即首先在原图中将结点按社区划分为多个集合,然后再将各集合中的中心结点建成带权查询子图,经过多次递归操作,最终为各结点建立一个基于社区中心的树状结构标签集,该标签集可以实现利用较短的创建时间和较小的存储代价大幅度提高距离查询的效率.从实验结果可以看出,该方法综合效率明显优于现存的高效算法.
Distance querying is one of the most fundamental operations in many graph data mining applications. However, most of the previous methods cannot handle large graphs, especially those with more than a hundred thousand vertices. To solve this problem, a multilevel community center labels index structure was proposed. Firstly, the vertices of the original graph were divided into different communities. Then a weighted query sub-graph was constructed by each community center. Finally, a tree-like label set was built for every vertex. The query efficiency could be improved greatly with small time and storage cost. The experimental result showed that the overall efficiency of this approach is significantly better than those of the-state-of-the-art algorithms.