关联分类具有较高的分类精度和较强的扩展性,但是由于分类器是由高置信度的规则构成,因此有时会出现过拟合。因此考虑在fp-growth挖掘频繁项的基础上。计算频繁项与测试数据间的最小差异度,即分类规则与测试数据的匹配程度。将最小差异度最小的类标号赋予测试数据。实验结果表明,该算法较先前算法有较高的精确度,如CBA (Classification-Based Association),CMAR (Classification based on Multiple Association Rules),CPAR(Classification based on Predictive Association Rules)。但是不足之处是精确度提高的代价是存储频繁项的矩阵过于庞大.系统开销不小。
Associative classification has high classification accuracy and strong expansibility. However, as its high confidence, it still suffers from overfitting. So compute the min-discrepancy between frequent items and test data based on the frequent items which produced by fp-growth..Put the class label which has the minimal discrepancy to the test data. Experimental results show that CFPM has better classification accuracy in comparison with CBA,CMAR and CPAR. But the nih accuracy is at the expeme of system spending for its large motrix which used to store the frequent items.