从可信计算角度,提出一种可靠信任推荐文本分类特征权重算法,分析了特征在文档中的特性,基于Beta分布函数研究了特征与文档类之间的信任关系,建立特征权重计算模型,并实现简单高效的线性文本分类器。在比较实验中采用20newsgroup和复旦中文语料集。与TFIDF算法进行性能比较,实验结果显示该算法性能较TFIDF显著提高,并对非平衡语料具有良好的适应性。
By reliable trust recommendation, used a feature weighting approach to construct the simplest linear weighting classifier in the procedure of which characteristics of feature were explored, while the trust relationship between features and categories was developed based on Beta distribution function. Experiments with 20newsgroup and Fudan Chinese evaluation data collection reported shows that this new algorithm generally outperformed TFIDF, and has good adaptability to non-equilibrium corpus.