东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

Efficient Mining of Frequent Closed XML Query Pattern

时间：0
分类：TP31[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
相关基金：This work is supported in part by the National Natural Science Foundation of China under Grant No. 60573094, the National Grand Fundamental Research 973 Program of China under Grant No. 2006CB303103, the National High Technology Development 863 Program of China under Grant No. 2006AA01A101, and Tsinghua Basic Research Foundation under Grant No. JCqn2005022.
相关项目：纯XML数据库管理系统中的关键问题

关键词：计算机软件, 频繁关闭模式, XML, 数据采集, 询问模式, computer software, frequent closed pattern, data mining, XML, XPath

中文摘要：

以前的研究工作介绍了有说服力的争论一个经常的模式采矿算法不应该全经常的矿但是仅仅关上的因为后者导致不仅更多的紧缩的还完全的结果集合而且更好的效率。在经常的关上的 XML 质问模式的发现之上，索引并且缓冲能有效地为质问性能增强被采用。大多数为基本上发现经常的模式的以前的算法介绍了直接 generate-and-test 策略。在这篇论文，我们现在的 SOLARIA * ，为采矿的一个有效算法没有候选人维护的经常的关上的 XML 质问模式和昂贵的树抑制检查。顺序采矿的有效算法涉及发现经常的组织树的模式，它瞄准用在序列检查的便宜父母孩子代替昂贵的抑制测试。SOLARIA * 深深地由父母孩子为经常的模式枚举修剪无关的查找空间关系限制。由各种各样的真实数据上的彻底的试验性的研究，我们表明 SOLARIA 的效率和可伸缩性 * 在以前的已知的选择上。SOLARIA * 以 XML 质问的尺寸也是线性地可伸缩的。电子增补材料这篇文章(doi:10.1007/s11390-007-9081-z ) 的联机版本包含增补材料，它对授权用户可得到。

英文摘要：

Previous research works have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. Upon discovery of frequent closed XML query patterns, indexing and caching can be effectively adopted for query performance enhancement. Most of the previous algorithms for finding frequent patterns basically introduced a straightforward generate-and-test strategy. In this paper, we present SOLARIA＊, an efficient algorithm for mining frequent closed XML query patterns without candidate maintenance and costly tree-containment checking. Efficient algorithm of sequence mining is involved in discovering frequent tree-structured patterns, which aims at replacing expensive containment testing with cheap parent-child checking in sequences. SOLARIA＊ deeply prunes unrelated search space for frequent pattern enumeration by parent-child relationship constraint. By a thorough experimental study on various real-life data, we demonstrate the efficiency and scalability of SOLARIA＊ over the previous known alternative. SOLARIA＊ is also linearly scalable in terms of XML queries＇ size.

同期刊论文项目