为了提高中文网页自动分类的分类精度,将SVM-KNN方法用于中文网页分类。提出了一种中文网页的表示方法,在将下载的网页全部表示为向量空间的向量后,用SVM构造了一个多类分类器。在分类时通过在特征空间计算网页所表示的向量和分界面的距离决定采用SVM方法还是KNN方法对其分类。实验证明该方法是一种有效的方法,对网页分类的各类,使用该方法比使用SVM方法具有更高的分类精度,同时能缓解SVM训练时对核参数的选择困难问题。
The SVM-KNN method is applied to Chinese web page classification in order to improve the classify accuracy of Chinese Web page. After analyzing the characteristics of web pages, a new vector representation of web pages was proposed and applies to web page classification. A multi-class classifier is constructed using SVM after the downloaded web pages was presented as a vector of vector space. In the classify phase, the distance is computed from vector to hyper-plane, based on which the SVM and KNN method are chosen. Experiment result shows the SVM-KNN method is valid method, which has a higher classify racy than that of the SVM. Besides, the problem of choosing kernel function parameters SVM is solved using this method.