决策树是分类中的常用方法,以ID3决策树算法为基础,提出一种改进型决策树算法。改进后的ID3算法针对决策树在分类过程中遇到的训练集中存在相同属性集,但属于不同类别的实例的情况,不再采用多数表决法判断叶结点的类别,而是采用基于信息增益的属性约简和最小距离分类的新方法进行类别的判断。实验表明改进后的算法对于优化决策树的结构,提高分类准确率具有良好效果。
The decision tree is the commonly used method of classification. Based on the algorithm of ID3 deci sion tree, an improved decision tree algorithm is put forward. Aiming at the training sets which have some instances that have the same attributes but belong to different classes, the improved ID3 algorithm using a new method that combines attribute reduction and minimum distance classification which are based on information gain rather than majority vote method to judge classification for leaf node. The experimental results show the improved algorithm has a good effect on optimizing the structure of decision tree and improving accuracy rate of classification.