现有的一种特征选择算法DPM(Discriminating Power Measure)[1],是通过计算每个特征在某一类别和剩余其他类别中的文档频,比较了特征对一个类别和对其他类别的贡献,提取出具有强类别区分能力的特征词.在研究此特征选择算法的基础上,提出了一种改进的特征选择算法,该算法同时考虑了每个特征的类别频次在计算特征类别区分能力方面的重要性.经实验验证,改进后的特征选择算法能够获得较好的分类效果.
By calculating the document frequency of each feature in a category and other categories,an existing feature selection algorithm-DPM(Discriminating Power Measure) compares the contribution of features for one category with other categories,and extracts the features that reveal larger differences among categories.On the basis of the research of DPM,a new feature selection algorithm is proposed,which considers the importance of word frequency of each feature in the calculation of discriminating power of features.Experiments demonstrate that the proposed algorithm has a better performance.