在数据挖掘和机器学习研究中,许多算法以离散值为处理对象,常常需要对连续属性进行离散化。以有监督和无监督离散化为线索,对典型离散化算法的基本思想进行梳理总结,并从时间复杂度以及对后续分类的影响等角度进行对比。最后对连续属性离散化的一些主要研究方向进行展望。
In studies of machine learning and data mining,quite a few algorithms take the discrete values as the processing objects,and often have the need to discretise continuous attributes. Taking the supervised and unsupervised discretisation as the clue,we sort out and summarise the basic idea of typical discretisation algorithms,and make the comparison from the perspectives of time complexity and the effects on the classification implemented afterwards respectively. Finally,we suggest the expectation on a couple of main research directions about continuous features discretisation.