为了高效地对从Internet上获取的文档进行训练并归类,给出了一种新的分类器模型。该模型在传统的向量空间模型(VSM)中引入了关键词语的加权因子,并在训练文档过程中对文档类型特征向量进行动态优化。这在一定程度上恢复了关键词语实际应具有的权值,方便了阈值的选取,使分类更加准确和高效。实验表明,该分类器分类合理、分类准确性有明显的提高,并具有一定的学习功能。
In order to train and categorize the articles more efficiently, which are obtained from Intemet, this paper gives a new model of a classifier. This model applies the weighted factors of keywords on traditional Vector Space Model(VSM) and optimizes the characteristic vectors of articles when they have been trained. It can repair the weighted values of keywords and make the selection of the threshold value more convenient. The tests prove that this classifier which can categorize articles reasonably and more oreciselv also has the learning caoacity.