信用评分模型的建模样本是由坏客户这一稀有事件和好客户这一大众事件组成的不平衡数据,故从模型残差的方差这一角度刻画稀有事件识别的难度,借鉴机器学习领域处理不平衡数据的方法,对建模样本中的稀有事件做特殊采样处理然后再建模,并证明对建模样本做特殊采样处理后必须用经验公式校正样本偏差。实证分析表明这是提高信用评分模型准确性的有效方法。
The modeling samples of credit scoring models are unbalanced data consisted by the rare event of being a bad customer and the common event of being a good one. From the variance of model residual, the difficulty of rare event detection is depicted. The special sampling method applied in unbalanced data in machine learning is referenced to dealing with the modeling samples of credit scoring models. An empirical correction formula must be used to correct the sample bias caused by the special sampling is verified. The results of empirical study demonstrated the efficiency of this method.