很多应用领域产生大量的序列数据.如何从这些序列数据中挖掘具有重要价值的模式,已成为序列模式挖掘研究的主要任务.研究这样一个问题:给定序列S、支持度阈值和间隔约束,从序列S中挖掘所有出现次数不小于给定支持度阈值的频繁序列模式,并且要求模式中任意两个相邻元素在序列中的出现位置满足用户定义的间隔约束.设计了一种有效的带有通配符的模式挖掘算法One-Off Mining,模式在序列中的出现满足One-Off条件,即模式的任意两次出现都不共享序列中同一位置的字符.在生物DNA序列上的实验结果表明,One-Off Mining比相关的序列模式挖掘算法具有更好的时间性能和完备性.
There is a huge wealth of sequence data available in real-world applications.The task of sequential pattern mining serves to mine important patterns from the sequence data.Given a sequence S,a certain threshold,and gap constraints,this paper aims to discover frequent patterns whose supports in S are no less than the given threshold value.There are flexible wildcards in pattern P,and the number of the wildcards between any two successive elements of P fulfills the user-specified gap constraints.The study designs an efficient mining algorithm: One-Off Mining,whose mining process satisfies the One-Off condition under which each character in the given sequence can be used at most once in all occurrences of a pattern.Experiments on DNA sequences show that this method performs better in time and completeness than the related sequential pattern mining algorithms.