页面分类是Web信息处理的一个基础性问题,而页面分类算法是页面分类中设计实现分类器的理论基础.在页面分类算法领域,目前最典型的分类算法包括决策树算法、贝叶斯算法、KNN算法等.笔者讨论了这几种典型的分类算法的理论基础,分析了每个算法的优缺点,最后,笔者给出了一种基于C4.5算法的Web页面分类器的实现过程.
Page classfication is a basic problem of Web information processing, and the page classification algorithm is the theoretical basis of the design of the classifier page classification. In the field of page classification algorithm, classification algorithm is the most typical include decision tree algorithm and Bias algorithm and KNN algorithm. This paper discusses the theoretical basis ~of the typical classification algorithms, analyzes the advantages and disadvantages of each algorithm, finally, this paper gives an implementation of Web Page Classifier Based on C4.5 algorithm.