现有的优秀的聚类算法大多是处理低维数据的,但是对于高维数据,由于其分布特性与低维情形有很大的差异,这些算法失效.为解决高维分类型数据聚类问题,提出了一种基于粗糙集的高维分类型数据子空间聚类算法,基于粗糙集的上、下近似集的类边界描述,确定了类边界范围,然后采用相容度来调整类边界,聚类的过程采用增长子空间的思想,从低维到高维迭代地搜子空间类簇.最后通过在soybean、zoo数据集上的对比实验,实验结果表明了算法不仅可行,而且精度高.
The existing excellent clustering algorithms are mostly used in processing the low dimensional data. For high dimensional data, its distribution characteristics are different from the low dimensional case. These algorithms fail to solve the high dimension data clustering problem. A clustering algorithm is presented based on the rough set and high dimensional categorical data subspace. The rough set's up and down approximations set to describe the class boundary, thus determine the range of boundary. The consistency degree is used to determine the clustering. The clustering process uses the growth subspace idea. Finally, good results are obtained through the experiment on the soybean, zoo data set. Results show that the algorithm is feasible and has high precision.