针对传统的ID3算法在选择分裂属性上对取值较多属性过分依赖的缺点,提出了基于主成分分析的决策树优化算法.该算法是通过主成分分析综合了信息增益和相关度系数来选择分裂属性.论文通过UCI提供的标准数据集,对优化算法进行测试,分析了优化算法的性能特点,验证了优化算法在分类正确率和执行效率上要优于ID3算法.
In this paper, a decision tree optimization algorithm the disadvantage of ID3 that depended too much on attributes based on principal component analysis is proposed to overcome that had more values when chose splitting attributes, The algorithm used principal component analysis method to integrate information gain and correlation coefficient as the basis of the sequence of splitting attributes. The paper tested the optimization algorithm using the standard data sets provided by UCI. The characteristics of the optimization algorithm were analyzed and the result showed it was more precise and efficient than ID3.