客户流失数据是一类的非平衡数据集,如何有效地对其进行分类学习,其关键是要提高少数类(流失客户)的识别率,少数类是相对多数类而言的一类特殊的子样本,其错分的代价非常高,因此,有效地减少少数类的错分率是一个亟待解决的问题。本文在Veropoulous提出的采用不同惩罚因子数的支持向量机算法基础上,利用样本自身信息熵值来确定不同的惩罚因子,使模型更加倾向于提高少数类的识别精度,并在电信客户流失数据这一非平衡数据集中进行了验证,结果表明该方法较其他方法对流失客户(少数类)的识别率有很大的提高,具有很强的实际应用意义。
Customer churning data is a kind of imbalance dataset,the key issue for effectively classification learning is to improve the prediction accuracy of Minority Class.The Minority Class is a type of special sub-sample relative to majority class,and the cost of Minority Class misclassification is extraordinary high.Therefore,it is a urgent problem to be solved that how to effectively reduce the misclassification of Minority Class.This paper obtains the various penalty factor with the use of information entropy,on the base of SVM adopting various penalty factor proposed by Veropoulous,enables the model improving the identification accuracy of Minority Class,and confirms the validation on the imbalance dataset of telecommunication customer churning,the result suggests that this method largely improves the identification accuracy of fled customer compared to other methods and is of great application meaning.