针对子项集时间序列提出一种模式挖掘的数学模型.此模型计算并更新子项的平均频率,并以模式考察时间阈值为周期,计算当前实时频率矢量和模式集中现有实时频率矢量的皮尔松相关性.如果相关系数大,则说明当前模式已经存在于模式集中;如果相关系数小,则说明当前模式是一个新模式,继而加入模式集.此过程持续运行,直至当模式集趋于稳定.另外,本文考察了模式之间的顺序关系,即模式之间的模式.通过设置一个窗口寄存器,并在模式序列矩阵中的对应位置计数加1,模型可以计算出任两个模式之间顺序的支持度和信任度.此模型关注的是提取出子项集的模式、子项集模式之间的模式.此外,通过调节考察时间的阈值,此模型也能提取出子项集模式之中的模式.在实验中,通过模拟子项集序列,我们证明了理论模型的有效性和普适性.结合实践,运用此模型到Web安全上,通过对新浪门户网站的考察和检验,此模型对于防御Web异常问题非常高效.
This paper proposes an analytical algorithm on mining models in time series. The algorithm calculates and updates the aver- age frequency for each sub - set item. In each checking period, we compute the Pearson correlation coefficient between the real - time frequency vector and the ones in mode set. If the coefficient is larger than threshold, current vector is deemed to be enumerated in the model set. If the coefficient is smaller than threshold, we add this vector into the model set and consider it as a new model. This process continues till the model set becomes steady. The proposed algorithm also examines the sequence between models. Through a temporary memory recording new models, the value of support and confidence for each model can be derived. In this paper, we con- centrate on the models, the models between models and the models inside models. In experiments, by simulating a sequence of sub - set items, we prove the effectiveness and the correctness of the proposed mechanism. In practice, we introduced the data from www. sina. com and found that this algorithm is very effective for the Web security.