提出了一种基于特征选择和特征抽取的混合型文本特征降维方法.通过一种改进的优势率方法进行初次特征选择,将文本表示为以类别属性为行向量的矩阵形式;再使用一种改进的最大散度差特征抽取方法进行二次特征抽取.在最大限度减少信息损失的前提下实现了文本特征的二次降维.对中文文本的分类实验结果表明,提出的特征降维方法具有良好的分类效果.
This paper presents a mixed method of reducing the text features which based on the feature selection and the feature extraction.Firstly,we carry out the first selection through improving on the odds radio.The text is expressed with the matrix vector which composes of sort attributes.Then,we use an improved scatter difference to extract the features again.In the condition of the least information lost,we have completed the text feature reduction twice.The result which based on the Chinese text categorization shows that this method has a better precision in the text categorization.