为改善剪枝算法单一的事前剪枝或事后剪枝导致分类响应时间长、准确度低的问题,在REP事后剪枝的基础上,提出一种CDC与REP结合的决策树剪枝优化算法。使用CDC算法在生成决策树的同时,利用左右子树节点差异比来排除部分非叶子节点,决策树生成后再通过REP算法对决策树进一步剪枝。实验结果表明,该算法可避免庞大决策树的生成过程过于细化导致过于拟合的现象,与其他算法相比,能减少分裂时问,提高决策树分裂的正确率。
Combined with the Child Difference Choose(CDC) method, a new method comes out based on the Reduced Error Pruning(REP) after pruning method, which improves the situation of longer time and lower accuracy result from the single pattern of pruning ways. CDC is to generate a decision tree while taking advantage of differences between left and right sub-tree nodes to exclude some non-leaf node. And makes a further pruning to the decision tree approaching REP method after generating a decision tree. Experimental results show that the method avoids the phenomenon of over-fitting because that the decision tree is too detailed, and compared with other methods, it greatly reduces the time of split. At the same time, approaching the further priming, it is proved to have a high accuracy again.