对基于信息论的离散化系列算法进行了分析,在此基础上提出了一种新的连续属性离散化方法。该算法使用信息偏差来对断点重要性进行度量,在离散化过程中使用不一致率进行控制以保证决策表的相容性不发生变化。最后通过使用C4.5和支持向量机(SVM)对该算法和其他算法进行性能对比,验证了该算法的有效性。
The discretization of continuous attributes is always with great contribution to the followed process of machine learning or data mining.A new algorithm based on information divergence for discretization is proposed.By an inconsistency checking,the procedure of discretization is controlled.The experiments are performed respectively with the results of discreted data by using C4.5 and SVM.The results show that the presented algorithm is effective.