探讨了贪心及其改进算法、基于属性重要性、基于信息熵和基于聚类四类连续属性离散化算法,并通过实验验证这四类算法的离散化效果。实验结果表明,数据集离散化的效果不仅取决于使用算法,而且与数据集连续属性的分布和决策数据值的分类也有密切关系。
This paper disscussed four kinds of discretization methods which include greedy and some improved algorithms, significance of attributes, entropy of information and clustering-based algorithms. And compard the quality of the four categories of algorithms. The last experiments indicate that the quality of discretization of dataset not only lies on the algorithm, but also is closely related to distributing of continuous attributes and data of decision.