位置:成果数据库 > 期刊 > 期刊详情页
分支合并对决策树归纳学习的影响
  • 期刊名称:计算机学报, 30:8 (2007) 1251-1258
  • 时间:0
  • 分类:TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
  • 作者机构:[1]河北大学数学与计算机学院,河北保定071002
  • 相关基金:本课题得到国家自然科学基金(60473045,60573069)资助
  • 相关项目:加权模糊规则的泛化能力研究
中文摘要:

传统的决策树构建方法,由于其选择扩展属性时的归纳偏置,导致属性值较多的属性总会被优先选择,从而导致树的规模过大,并且泛化能力下降,因此需对其进行简化.剪枝是简化的一种,分为预剪枝和后剪枝.该文主要针对预剪枝中的分支合并进行研究.文中研究了分支合并对决策树归纳学习的影响;具体讨论了在决策树的产生过程中,选择适当的分支合并策略对决策树进行分钟合并处理后,能否增强树的可理解性,减少树的复杂程度以及提高树的泛化精度;基于信息增益,分析了分支合并后决策树的复杂程度,设计实现了一种基于正例比的分支合并算法SSID和一种基于最大增益补偿的分支合并算法MCID.实验结果显示:SSID和MCID所得到的决策树在可理解性和泛化精度方面均明显优于See5.

英文摘要:

Since inductive bias exists during the process of selection of expanded attributes, attributes with more values are usually preferred to be selected. It consequently results in a decision tree with large scale and with poor generalization capability. Therefore it is necessary to simplify the decision tree including pre-pruning and post-pruning. This paper focuses on the pre-pruning. A new strategy of pre-pruning is given, that is, at the process of tree growth, two branches (or more) from the same node are merged into one branch and then the tree growth process continues. This paper investigates the impact of merging branches on decision tree induction. The main concerns are whether the comprehensibility, the size and the generalization accuracy of a decision tree can be improved if an appropriate merging strategy is selected and applied. Based on information gain, this paper analyzes the complexity of a decision tree before and after merging branches, and designs two algorithms of merging branches, SSID (based on the proportion of positive samples) and MCID (based on the most gain compensation). Experimental results show that with respect to the comprehensibility and the generalization capability, either SSID or MCID is significantly superior to the frequently used See5 system (the improved version of C4.5).

同期刊论文项目
期刊论文 47 会议论文 20 著作 1
期刊论文 30 会议论文 25 著作 2
同项目期刊论文