提出了一种适合于大规模高维数据库的组合优化决策树算法。相比于传统的类似算法,该算法从数据的离散化,降维,属性选择三方面进行改进,对决策树建立过程中不适应大规模高维数据库的主要环节进行了优化,有效解决了处理大规模高雏数据库问题的效率和精度之间的矛盾。仿真试验表明,该算法在大大减少了计算代价的同时提高了决策树的分类精度。
A combined optimization decision tree algorithm suitable for a large scale and high dimension data-base is presented. Compared with the traditional similar algorithms, the algorithm makes improvements from three aspects: discretization, reducing dimension and attribute selection. It also optimizes the main processes, so that it is suitable for large scale and high dimension data-base and effectively solves the conflict between efficiency and predictive precision. Experiments show that the proposed method raises the predictive precision of decision trees while it greatly reduces the computational cost.