为解决多核集成(Multiple Kernel Boosting,MKBoost)算法对噪声敏感的问题,考虑理想分类器区别对待正常样本与噪声样本的特点,提出了一种适用于被噪声污染数据集的多核集成算法.采用KNN(K最近邻)方法与logistic回归的融合,构造了样本噪声概率函数,计算出每个样本是噪声的概率,根据噪声概率构造了新的损失函数,利用加法模型得到每轮迭代的基分类器系数.UCI数据集上的实验结果表明,该算法可以有效降低多核集成算法对噪声的敏感程度,提高了鲁棒性.
Abstract: To solve the problem that the Multiple Kernel Boosting (MKBoost) algorithm was sensitive to noise, the characteristic that noise data and normal data were treated differently by the optimal classifier was considered, and a MKBoost algorithm applied to datasets polluted by noise data was proposed. The fuse of KNN (K-nearest neighbor) algorithm and logistic regression was used for the construction of noise probability function, and calculating the probability of each instance being noise. A new loss function was constructed via the noise probability function, and addictive model was utilized for computing the coefficients corresponded to base classifier in each iteration. Experiments on the datasets from UCI show that the proposed method could efficiently reduce the sensitivity of MKBoost algorithm to noise, and improved the robustness