通过引入位置因素修改TF—IDF因子进行初次特征选择,提出一种基于位置的文本特征加权改进模型;再借助类别信息构造类别向量提高文本类别表示能力,进一步提出一种位置加权模式下基于类别信息的文本特征加权改进模型。随后的文本分类试验表明,该加权模型相较于传统的TF—IDF方法,具有更好的文本分类效果。
Firstly,the authors modify the TF-IDF with the position information to choose the primal feature set and put forward an improved weighting model based on the position. Secondly, by using the sort information to construct the sort vectors,the authors make the sort vectors have better ability of label. Lastly, the authors put forward an improved weighting model with the sort information, and have a test about text categorization and the result shows that this method is better than the traditional TF-IDF.