频繁子图挖掘是图挖掘的一个重要研究课题.gSpan算法作为一种高效的子图挖掘算法具有较好的执行效率,它通过最右扩展生成频繁子图,但不能保证每次扩展得到的均为标准编码.针对此问题本文提出了一种改进的算法CSGM,它采用ADI++存储结构,能处理更大规模的图集,同时保证每次最右扩展均生成标准编码,既避免了对非标准编码图的支持度计算,也避免了对输入编码是否为标准编码的计算.在实际数据集上运行的实验结果表明它比原算法提高了挖掘效率.
Frequent graph mining is an important research subject of graph mining. As an efficient algorithm for subgraph mining, the gSpan has better efficiency through fight-most extension to generate frequent subgraph. But it can't guarantee each extension is a canonical code. An improved algorithm called CSGM is proposed in this paper to resolve the existing problem. It can deal with larger graph dataset through using the ADI + + storage structure and guarantee that each extension is a canonical code. The advantage is that it not only avoids the calculation of the support of the non-canonical code graphs but also the calculation of whether an input code is a canonical code. The experimental result on real datasets shows that the new algorithm improves the efficiency of mining.