增量支持向量机(Incremental Support Vector Machine,ISVM)模型通过每次加入一个或者一批样本进行学习,将大规模问题分解成一系列子问题,以提高支持向量机(Support Vector Machine,SVM)处理大规模数据的学习效率,但传统ISVM(Traditional ISVM,TISVM)模型中增量样本的选择方法不当可能降低其效率和泛化能力.针对ISVM中增量样本的选择问题,提出了一种基于概率密度分布的ISVM算法,称为PISVM,该方法通过概率密度分布选择含有较多重要分类信息(有可能成为支持向量)的增量样本进行训练,使得分类器能够以最快的速度收敛到最优.在标准数据集UCI上的实验结果表明PISVM模型可以在保持其泛化能力的同时进一步提高学习效率.
Incremental support vector machine model(ISVM)joins a sample or a batch of samples to learn in each cycle,and then the problem can be reduced from large-scale to a series of sub issues. Therefore, ISVM can improve the efficiency of support vector machine(SVM)to deal with large scale data. However, by using traditional support vector machine(TISVM),the convergence speed, efficiency and the eventual generalization ability may be decreased due to the incorrect selection of the incremental samples. To solve the problem, an ISVM approach (incremental support vector machine based on the probability density distribution, namely PISVM)is proposed through choosing those incremental training samples including much important classification information based on probability density distribution. Using the approach can make the classifier get to the optimal hyper lane at the fastest speed. In order to verify the validity of the proposed approach, some experiments are done using the three approaches: the PISVM approach,the TISVM method and the minimum distance classifier approach. The experiment results on UCI data set demonstrate that the proposed PISVM can obtain high learning efficiency with good generalization performance simultaneously.