分析了基于熵的离散化标准的切点特性,提出并证明了一种基于边界点属性值合并和不一致度检验的离散化算法。与传统离散化算法相比,此算法只对边界点属性值进行合并,切点个数无需设定,自动生成,且合并规则简单易行,大大减小了计算量,适用于处理大规模高维数据库的离散化。同时由于采用了不一致度对备选切点集合进行调整,使本算法具有全局性。试验表明,该算法有效提高了分类规则的简明性和预测精度。
On analysis of the cut points characteristic of entropy-based discretizatio tion algorithm based on boundary points' attribute values mergence and inconsistency pared with the traditional discretization algorithms, the proposed method only merges n, an attribute discretiza check is presented. Corn the boundary points' at tribute values, auto-generates cut points' number without setting them in advance, applies simple rules to merge the intervals, and reduces the computational cost greatly. It is suitable for large scale and high dimension database discretization problems. By applying inconsistency to check the chosen cut points set, the algorithm possesses global property. Experiments show that the method can improve the simplicity and the prediction precision of classifying rules.