东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

不确定性数据流上频繁项集挖掘的有效算法

期刊名称：计算机研究与发展
时间：0
分类：TP311.13[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中山大学信息科学与技术学院,广州510275, [2]华南师范大学计算机学院,广州510631, [3]南方电网信息中心,广州510000
相关基金：基金项目：国家自然科学基金项目（61033010,61070005）;广东省自然科学基金项目（S2011020001182）;广东省科技计划基金项目（2009A080207005,2009B090300450,2010A040303004）
相关项目：基于不确定时间序列的Skyline分析的研究

关键词：频繁子图, 图分类, 图挖掘, 特征选择, 嵌入集, 数据挖掘, frequent subgraph pattern, graph classification, graph mining, feature selection, embedding set, data mining

中文摘要：

随着图数据收集技术在许多科学领域的发展，对图数据分类已成为机器学习和数据挖掘领域的重要课题．目前已经提出许多图分类方法．其中，一些图分类方法采用3步来构筑分类模型；一些图分类方法采用2步来构筑分类模型．这些方法在挖掘频繁子图或特征子图时，只考虑到子图的结构信息，而没有考虑到子图的嵌入信息．为此，在L—CCAM子图编码的基础上，提出了一种基于嵌入集的图分类方法．该方法采用基于类别信息的特征子图选择策略，不但考虑了子图的结构信息，而且在频繁子图挖掘过程中充分利用嵌入信息嵌入集，通过一步即直接选择特征子图以及生成分类规则．实验结果表明：在对化合物数据分类时，在分类精度上该方法优于采用3步的图分类方法；在运行效率上该方法优于采用2步和3步的图数据分类方法．

英文摘要：

With the development of highly efficient graph data collection technology in many scientific application fields, classification of graph data becomes an important topic in the machine learning and data mining community. At present, many graph classification approaches have been proposed. Some of the graph classification approaches take three steps, which are mining frequent subgraphs, selecting feature subgraphs from mined frequent suhgraphs, and constructing classification model by frequent subgraphs. Some other graph classification approaches take two steps, which are mining discriminative subgraphs directly from graph data and learning classification model by discriminative subgraphs. However, during mining frequent subgraphs or discriminative subgraphs, these approaches only take advantage of the structural information of the pattern, and do not consider the embedding information. In fact, in some efficient subgraph mining algorithms, the embedding information of a pattern can he maintained. We propose a graph classification approach, in which we employ a novel subgraph encoding approach with category label and adopt a feature subgraph selection strategy based on category information. Meanwhile, during mining frequent subgraphs, we make full use of embedding sets to select the feature subgraphs and by only one step we are able to generate classification rules. Experiment results show that the proposed approach is effective and feasible for classifying chemical compounds.

同期刊论文项目