样本的不均衡性直接影响分类器的精度,基于C4.5算法提出一种改进算法——PC4.5,并应用于MIT数据集.实验表明该算法能很好地处理训练集的不均衡性,并降低决策树的规模.
The imbalance of pattern directly affects the precision of classifier. PC 4.5, an improved C 4.5 algorithm is proposed, which is applied in MIT dataset. The experiment indicates that PC 4.5 is effective on imbalanced datasets and the scale of the decision tree can be reduced.