随着网络上非平衡数据的大量涌现,使得对非平衡数据分类的研究成为一个新的研究热点.根据特征在类别中的分布特点,提出了基于类间、类内分布的方法.该方法不但充分考虑了稀有类别信息对特征选择的影响,使得构造的类别分布函数能够相当好地反映稀有特征的信息,而且能够选出对非平衡数据分类贡献大的特征.实验结果表明:此方法的MacroF1和MicroF1皆优于基于类别分布的特性选择(Category Distribution-Based Feature Selection,CDFS)和类别信息的方法.
With the unbalanced data set emerging in large numbers on the internet,the research on the unbalanced data classification becomes a new hotspot.According to the feature's distribution characteristics in the classification,a new feature selection method based on inter-class and between-classes distributions was proposed.The proposed method not only takes full account of the rare category of information's impact on the feature selection,making the constructed distribution function well reflect the characteristics of rare information,but also selects the features of significant contribution for unbalanced data classification.The experimental results show that both MacroF1 and MicroF1 of the proposed method have advantages over Category DistributionBased Feature Selection(CDFS) method and the class information method.