结合Internet不良文本信息的特点,运用贝叶斯理论设计了一种面向该类信息的网页分类方法,该方法兼顾分类效率与分类精度,对特征项选取以及权重计算的方法进行了优化,降低了分类特征维数,简化了分类过程的处理.实验数据表明,该方法保持了良好的性能,提高了效率.
The characters of bad text information are discussed. Based on Bayesian theory, a new text categorization approach to the bad text information is proposed. The approach improves the rate of text categorization by reducing the processing of word segmentation and the dimension of character space. Furthermore, it maintains the effect of text categorization by optimizing the selection of character item and the way to calculate it's weight. Experimental results show that this approach can maintain the effect and improve the rate of text categorization effectively.