完全加权正负关联模式在文本挖掘、信息检索等方面具有重要的理论和应用价值。针对现有挖掘算法的不足,构建完全加权正负关联模式评价框架SPRMII(support-probability ratio-mutual information-interest),提出完全加权项集双兴趣度阈值剪枝策略,然后基于该剪枝策略提出一种新的基于SPRMII框架的完全加权正负关联模式挖掘算法AWAPM_SPRMII(all-weighted association patterns mining based on SPRMII)。该算法克服了传统挖掘算法缺陷并采用新剪枝方法从完全加权数据库中挖掘有趣的频繁项集和负项集,通过项集权重维数比的简单计算和SPRMII评价框架,从这些项集中挖掘有效的完全加权正负关联规则。理论分析和实验表明,该算法有效,具有良好的扩展性,与现有经典挖掘算法比较,获得了良好的挖掘性能。
All-weighted association patterns mining has important theoretical and application value in the text mining, information retrieval and the like. Aiming at the issues of the existing mining algorithms ,this paper introduced an evaluation framework SPRMII (support-probability ratio-mutual information-interest ) for all-weighted association patterns and the dual interest threshold pruning strategy firstly. And then it proposed a novel mining algorithm AWAPM SPRMII ( all-weighted association patterns mining based on SPRMII) based on SPRMII for mining all-weighted positive and negative association patterns in data- bases. The algorithm could not only overcome the defects of the traditional association rules mlning and avoid ineffective and uninteresting association patterns generated, but also efficiently mine interesting frequent itemsets and negative itermsets from massive all-weighted databases and further discover all-weighted positive and negative association rules only with easy computa- tion and comparison of the ratio of weight to dimension from the itemset. As shown in the theoretical analysis and the experi- mental results on real-world text dataset, in contrast with the traditional mining methods, this approach can work more efficiently and effectively discover all-weighted positive and negative association patterns.