为提高超文本分类算法的性能,降低算法的复杂度,提出一种适用于超文本分类的加权超球支持向量机算法.该算法综合文档内容信息和超链接信息作为文档特征向量,针对传统超球支持向量机算法在不同类别样本数目不均衡时训练分类错误倾向于样本数目小的类别的问题,利用加权因子补偿了类别差异对算法推广性能造成的不利影响.在基准数据集上的测试结果表明,该算法降低了二次规划的复杂度,提高了分类器的分类性能.
To improve the performance of hypertext classification algorithm with less computational complexity, a weighted hy persphere support vector machine algorithm for hypertext clas sification was developed considering content information and hy perlink information of the hypertext documents as document fea ture vectors. When training sets with uneven class sizes were used, the classification error based on traditional hypersphere support vector machine was undesirably biased towards the class with fewer samples in the training set. Weight factors were used to compensate for the unfavorable impact caused by the uneven class sizes. Experiments on benchmark data set verify the effi ciency of the proposed algorithm with less computational com plexity, and the classification performance of the classifier is im proved.