为了提高网页自动分类的准确率,基于信息融合的模型理论,提出了一种通用的网页自动分类模型和融合算法。该模型根据完成功能的不同分为四个层次:信息抽取层、数据预处理层、特征层和决策层,其中特征层是针对网页上不同种类的媒体信息采用不同的分类方法进行分类,并将分类结果分别输入决策层和与该特征层算法相关的其他的特征层。决策层是处理特征层的分类结果,并推导出最终的网页分类融合结果,并将该模型和算法进行了实现。实验表明,文章提出的融合模型和算法可以有效地改进网页自动分类准确率。
For higher text classification precision, a general feature layer fusion classification model and algorithm are proposed, based on model theory of information fusion, adopting multi-information of the network for different classification, text and image information are used in the paper. The model includes two layers mainly, one is feature layer, which deals with different Media information with different classification algorithm, and inputs the classification results into the higher layer fusion centre separately. The other is decision layer, which deals with the results from the feature layer, and concludes the final classification result. The experiment expresses the fusion model can improve the text classification precision effectively.