针对粗糙集及主要机器学习算法一般都无法高效处理连续数据的问题,提出一种基于CACC的连续数据离散化的改进算法。该算法采用CACC标准选取断点,通过增加数据不一致率约束条件,从而减少数据丢失信息量。仿真结果表明,CACC改进算法与ModifiedChi2、Extent—Chi2、CAlM、CACC算法相比,并通过C4.5和SVM算法验证,数据识别率和精度可提高近8%。
Aiming at the problem that rough set and the main machine learning algorithms can not efficiently handle continuous data, this paper presents an improved CACC algorithm for discretization of the continuous data. This algorithm adopts the CACC standard to select breakpoints to increase constraints on data inconsistency, thereby reducing the amount of information loss. Simulation results show that the algorithm outperforms the corresponding algorithms, such as Modified Chi2, Extent-Chi2, CAIM, CACC, through the C4.5 and SVM algorithm validation, the maximum amplitude of data recognition rate and accuracy is increased bv 8%.