C5.0算法是一种直观、效率高的分类方法,但该算法存在信息增益率计算复杂、容易出现过拟合和决策树偏倚的问题。针对这些问题,通过公式的转换简化信息增益率的计算过程,在剪枝过程采用了损失矩阵和置信区间的结合进行剪枝判断,以及对建立的多个模型的权重进行调整,提出了一种新的C5.0改进算法,并将其应用于信贷逾期预测上。使用借款人的历史还款数据进行实验,并与其他算法进行比较,结果表明:C5.0改进算法相比其他算法具有更高的准确率和效率。
C5. 0 algorithm is a classification method with intuitive and efficient,and has problems like information gain rate calculation is complex,prone to over-fitting and decision tree bias. Aiming to solve these problems,a new improved C5. 0 algorithm was proposed in this paper,which by converting formulas to simplify the calculation procedure of information gain rate,pruning judgment through using a combination of loss matrix and a confidence interval,and adjusting the weights of the established models. It was applied to the prediction of overdue credit. Finally,conduct an experiment in borrower's historical repayment data,and compared it with other algorithms. The results showed that the improved C5. 0 algorithm has higher accuracy and efficiency than other algorithms.