传统的基于聚类的SVM多类分类方法在聚类时并不考虑样本的类别信息,最终形成的二叉树分支一般很多,当异类样本特征相近时该方法性能下降明显。针对这一问题,将线性判别分析法引入二叉树建树过程中,每次在对待训练样本集聚类之前先进行优化处理,通过寻找最佳投影子空间使得同类样本聚集、异类样本松散,从而优化二叉树结构,以此改进分类效果,并在UCI数据集上进行实验,结果表明该方法减少了二叉树分支,提高了分类的准确率。
Because the information of class-labels is not considered by the traditional multi-class SVM based on clustering, too much branches of the binary-tree are formed, especially in the case of samples in different classes having similar features. To solve the problem, linear discriminant analysis is introduced to binary-tree, the pretreatment that training samples before clustering is done to find optimal feature space in which the samples in the same classes will be gathered together, while the samples in different classes will be loosed, so binary-tree is optimized and the implementation of the algorithm is improved. The experiment is carried out on the UCI data sets. The results show that this method reduces the branches of binary-tree and improves the accuracy of the algorithm.