TF-IDF是文本特征赋权的常用方法.该方法简单易行,但没有考虑位置因素对特征赋权的影响.通过修改因子,分析不同条件下文本表现形式的差异,提出3个基于位置的文本特征加权方法.随后的文本分类试验表明,此加权模型相比较于传统的方法,均具有较好的文本标注效果.
TF-IDF is a kind of common methods used to measure the terms in a document.This method is easy but it considers no factor of the position.By modifying the TF-IDF with the position information and analyzing the difference of texts form under the different situation,we put forward three means based on positions to weight the terms.We have a test about text categorization and the result shows that these methods have a better precision than the common TF-IDF.