在一个数据集中,至少有一个类别相对与其他类别有很少的样本,则这样数据集可以称为高度倾斜的或者是非平衡的数据集,非平衡数据在现实中普遍存在。在非平衡数据分类中.传统机器学习算法的分类表现受到了阻碍。支持向量机(SVM)基于结构风险最小化原则,是近几年发展起来的机器学习方法。分析了SVM在非平衡数据集中的应用情况,同时提出了几种SVM运用于非平衡数据集中的主要改进方法,这些方法对于非平衡数据的分类有很好的分类效果。
A training data.set is called imbalance if at least one of the classes are represented by significantly less number of instances than the others. The class imbalance problem occurs when there is significantly less number of observations of the target concept. Various real - world classification tasks suffer from this phenomenon. The class imbalance problem has been known to hinder the learning performance of classification algorithms. The support vector machine theory is based on the minimization principle to structure risk. Support vector machine is an algorithm of machine learning that has developed during these years. Summarizes the state of the application of SVM in imbalances data. Then introduce some algorithms improved to get good performance.