在解决单分类问题的支持向量数据描述算法的基础上提出了适用于两类不平衡问题的I-SVDD(imbalance-support vector date description)算法.该算法通过增加样本的分布信息,对带野值的SVDD算法中的C值重新进行了定义.采用该算法对UCI数据集和人工样本集进行实验表明,改进后的I-SVDD算法比带野值的SVDD算法的AUC值平均提高12%以上;比AdaBoost算法在正类查全率上平均提高35%,精确度也提高了2%以上.I-SVDD算法在保证少数类样本高分类精度前提下,还有效提高了全样本的分类精度,更符合现实不平衡问题中对少数类样本的处理要求.
In this paper, an imbalance support vector data description (I-SVDD) algorithm for two-class imbalance problem is proposed based on the SVDD algorithm. In this algorithm, the C value of SVDD with negative samples is redefined for each sample with data distribution information. We verified the efficiency of algorithm using artificial data and UCI datasets for the data imbalanced classification problem. Compared with SVDD with negative samples, the AUC value of I-SVDD is increased by 12%. Compared with AdaBoost, the recall of positive class is increased by 35%, and the precision increased by 2%.