针对SMOTE(SyntheticMinorityOver—samplingTechnique)在合成少数类新样本时存在的不足,提出了一种改进的SMOTE算法(SSMOTE)。该算法的关键是将支持度概念和轮盘赌选择技术引入到SMOTE中,并充分利用了异类近邻的分布信息,实现了对少数类样本合成质量和数量的精细控制。将SSMOTE与KNN(K—NearestNeighbor)算法结合来处理不平衡数据集的分类问题。通过在UCI数据集上与其他重要文献中的相关算法进行的大量对比实验表明,SSMOTE在新样本的整体合成效果上表现出色,有效提高了KNN在非平衡数据集上的分类性能。
Based on analyzing the shortages of SMOTE (Synthetic Minority Over-sampling Technique), an improved SMOTE (SSMOTE) is presented. The key of SSMOTE lies on leading the concept of support and roulette wheel selection into SMOTE and making full use of the heterogeneous nearest-neighbor distribution information to achieve the fine control of the synthesis quality and quantity to the minority class samples. SSMOTE and KNN(K-Nearest Neighbor) are combined to handle the classi- fication problem on imbalanced datasets, and extensive experiments are conducted to compare SSMOTE and algorithms in perti- nent literatures on the UCI datasets. The simulation results show SSMOTE promises prominent synthesis effect to the minority class samples, and brings better classification performance on imbalanced datasets with KNN.