由于多类不平衡数据中某些类别的样例数特别少,使得基于支持度一置信度的关联分类方法在这些类上产生的规则较少,甚至没有,从而导致这些类别的样例很难准确分类.针对此问题,文中提出改进的多类不平衡数据关联分类算法.为了提取更多小类的规则,根据项集与类别的正相关度提取规则.为了提高小类规则的优先级,提出利用项集类分布规则强度排序规则.此外,为解决规则冲突或无规则匹配问题,结合KNN分类新样例.实验表明,与基于支持度一置信度的关联分类方法相比,文中算法能提取更多的小类规则,且提高小类规则的优先级,在多类不平衡数据上取得较高的G—mean值和F-score值.
Instances in some classes are rare in muhiclass imbalanced datasets and therefore few rules for these classes are generated by support-confidence based associative classification algorithms. Consequently, instances in these minority classes are difficult to be correctly classified. Aiming at this problem, an improved associative classification algorithm for multiclass imbalanced datasets is proposed. To extract more rules for minority classes, rules are extracted according to positive correlation between itemsets and classes. Then, to improve the priority of minority classes rules, the rule strength based on itemsets class distribution is designed to rank rules. Finally, to address problems of no matched rules or matched rules in conflict, a k nearest neighbor algorithm is incorporated into the improved associative classification to classify new instances. Experimental results show that the proposed algorithm extracts more minority classes rules and promotes the priority of the minority classes rules compared with support-confidence based associative classification, and thus G-mean and F-score value for muhiclass imbalance datasets are improved.