直接营销策略的分割超平面(Separating HyperPlane,SHP)方法所构建的线性超平面(Linear HyperPlane,LHP)函数集的Vapnik-Chervonenkis(VC)维不超过9,并且能够快速分类和保护数据隐私,但其训练速度慢,对样本分布敏感以及不能解决非线性等问题。为此,该文提出一种适合大样本问题的非线性分类方法,称为分割超平面的快速集成方法(Fast Ensemble of Separating HyperPlane,FE-SHP)。此方法先将训练样本划分为多个集合并分别构造它们的次优线性超平面,然后利用径向基函数(Radical Basis Function,RBF)改善次优线性超平面的非线性能力,同时引进优化权提升次优线性超平面的非线性集成效果,并将集成输出转化为概率输出,进而通过梯度下降法最大化训练样本的交叉熵对数似然函数求解相关参数。UCI数据集的实验结果表明,FE-SHP在处理大样本方面具有较好的优势。
Although the function set of Linear HyperPlane(LHP) obtained from the Separating HyperPlane(SHP) method based on direct marketing campaigns has a very low Vapnik-Chervonenkis dimension equal to 9 or lower and the corresponding optimized LHP can fast detect unseen instance and preserve user's privacy,it is inefficient in training speed,sensitive to training examples and not able to apply to nonlinear datasets.For overcoming these drawbacks as above,a nonlinear classification approach is proposed in this paper,which is suitable for large datasets and called Fast Ensemble of Separating HyperPlane(FE-SHP).First,the original training data is split into several subsets and their suboptimal LHPs are respectively constructed.Then,the nonlinear ensembling effects of suboptimal LHPs are enhanced by introducing an optimized weight vector after improving their nonlinear capabilities with Radical Basis Function(RBF).Finally,the related parameters are solved by the gradient descent method to maximize a log likelihood function which is the cross-entropy error of training data following the ensembling output of suboptimal LHPs being mapped probabilities.Experimental results on UCI demonstrate that the presented FE-SHP obtains competitive effectiveness for large datasets.