以前的研究工作介绍了有说服力的争论一个经常的模式采矿算法不应该全经常的矿但是仅仅关上的因为后者导致不仅更多的紧缩的还完全的结果集合而且更好的效率。在经常的关上的 XML 质问模式的发现之上,索引并且缓冲能有效地为质问性能增强被采用。大多数为基本上发现经常的模式的以前的算法介绍了直接 generate-and-test 策略。在这篇论文,我们现在的 SOLARIA * ,为采矿的一个有效算法没有候选人维护的经常的关上的 XML 质问模式和昂贵的树抑制检查。顺序采矿的有效算法涉及发现经常的组织树的模式,它瞄准用在序列检查的便宜父母孩子代替昂贵的抑制测试。SOLARIA * 深深地由父母孩子为经常的模式枚举修剪无关的查找空间关系限制。由各种各样的真实数据上的彻底的试验性的研究,我们表明 SOLARIA 的效率和可伸缩性 * 在以前的已知的选择上。SOLARIA * 以 XML 质问的尺寸也是线性地可伸缩的。电子增补材料这篇文章(doi:10.1007/s11390-007-9081-z ) 的联机版本包含增补材料,它对授权用户可得到。
Previous research works have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. Upon discovery of frequent closed XML query patterns, indexing and caching can be effectively adopted for query performance enhancement. Most of the previous algorithms for finding frequent patterns basically introduced a straightforward generate-and-test strategy. In this paper, we present SOLARIA*, an efficient algorithm for mining frequent closed XML query patterns without candidate maintenance and costly tree-containment checking. Efficient algorithm of sequence mining is involved in discovering frequent tree-structured patterns, which aims at replacing expensive containment testing with cheap parent-child checking in sequences. SOLARIA* deeply prunes unrelated search space for frequent pattern enumeration by parent-child relationship constraint. By a thorough experimental study on various real-life data, we demonstrate the efficiency and scalability of SOLARIA* over the previous known alternative. SOLARIA* is also linearly scalable in terms of XML queries' size.