东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

连续型抗共谋数字指纹研究

ISSN号：1000-386X
期刊名称：计算机应用与软件
时间：0
页码：734-737
语言：中文
分类：TP311.13[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]华东理工大学计算机科学与工程系,上海200237, [2]上海海事大学计算机科学与技术系,上海200135, [3]北京大学光华管理学院,北京100871, [4]复旦大学计算机科学技术学院,上海200433
相关基金：国家自然科学基金项目（60402008）;上海市科委创新行动计划基金项目（08170511300）;上海海事大学科技基金项目（2009445654）;华东理工大学教改基金项目（YH0126115）
相关项目：基于多种信息处理技术的面色诊信息自动识别研究

关键词： XML, 数据流, 分页, 频繁子树, 数据挖掘, XML, data stream, paging, frequent subtree, data mining

中文摘要：

随着XML数据流的广泛应用，从挖掘XML数据流中发现知识具有重要的理论与应用价值．相比其他频繁模式挖掘，大型XML文档与数据流的频繁子树挖掘面I临困难：XML数据流不可能整体在内存解析；对XML数据流分段挖掘必须考虑XML数据的半结构化特征等．针对上述问题，提出数据流分页频繁子树挖掘模型Tmlist．Tmlist对XML数据流进行分页，管理跨页节点及频繁候选子树的跨页增长，逐页挖掘频繁子树；频繁候选子树的增长根据根节点层次由浅至深地在最右路径加入频繁候选节点，避免以低层次为根子树的重复性递归增长；对频繁候选子树采用子树拓扑序列和最右路径共同标识，子树的增长不需要对子树前缀进行匹配，省去前缀节点存储与匹配开销；以页面最小支持度对频繁候选子树按页筛选，子树按页面衰减度衰减支持度、剪枝．Tmlist在可控误差范围内降低频繁子树挖掘的空间消耗，提高内存利用率和挖掘效率．

英文摘要：

With the widespread use of XML data stream, discovering knowledge from it becomes important. Compared with other frequent pattern mining, mining frequent subtree over large-scale XML documents and unlimited growing XML data stream is facing difficulties, data steam can not be resolved in memory as a whole, and mining partitioned XML data stream must be considered semi- structured characteristics of XML data, etc. Inspired by this fact, Tmlist is proposed for mining frequent subtrees over paging XML data stream. Tmlist pages XML data stream, manages cross-page nodes and frequent candidate subtrees growing across page, and mines frequent subtrees page-by- page. Frequent candidate subtrees grow by inserting frequent candidate nodes in their rightmost path according to the level of their roots, avoiding the repeated recursive growth of the subtrees rooted by the low-level nodes. A subtree is represented by the topologic sequence of its rightmost path, which avoids the prefix match for the increment of subtrees, so the storing and matching cost for the prefix nodes is cut. Frequent candidate subtrees are selected according to the page minimum support, the support of frequent subtrees is decayed and branches are pruned according to the decaying factor. Accordingly, Tmlist reduces the memory cost of mining frequent subtrees in the limit of error and improves memory utilization and mining efficiency.

同期刊论文项目