基于中文文本分类的定义和向量空间模型,本文分析了正确分类文本的关键所在。通过对传统的特征选择方法的分析,提出了新的特征选择方法。通过支撑向量机对中等规模语料库的实验,验证了此方法的有效性。
Based on the definition of text categorization and VSM (Vector Space Model), this paper analyzes the key points of correctly categorizing texts. After analyzing the conventional feature selection methods, a new feature selection method was proposed. Experiment result on a mid-size corpus With Support Vector Machine shows the effectiveness of the method.