针对属性取值为区间变量的高维数据聚类问题,提出基于模糊离散化的CABOSFV改进算法——FD-CABOSFV。针对属性组合利用模糊C均值聚类的思想进行属性取值的离散化,并通过A水平截取的方式确定各对象对离散化属性的归属,将其转换为二态变量后利用CABOSFV算法进行聚类。采用三组UCI基准数据集将FD-CABOSFV与著名的K-means聚类算法进行比较,实验结果表明FD-CABOSFV更有效。
Abstract FD-CABOSFV, an improved algorithm of CABOSFV based on fuzzy diseretizaton, is proposed for high-dimensional data clustering of interval-scaled variables. It discretizes the data of each attribute portfolio by using the idea of fuzzy C means clustering,and determines each object's discretized attribute category by λ cut turning the attributevalue into binary variables,and then uses CABOSFV algorithm to complete clustering. Three UCI benchmark data sets were used to compare FD-CABOSFV with famous K-means clustering algorithm. The empirical tests show that FD-CABOSFV is more effective.