模糊C-means算法是一种重要的聚类分析算法,但是在数据维数较高的情况下,该算法计算量急剧上升从而导致其效率较低.针对这一问题,提出了一种基于粗糙集理论的模糊C-means高维数据聚类算法,该算法在传统模糊C-means算法的基础上引入了粗糙集属性约简的理念,通过对数据集属性的约简,提取出对分类影响较大的属性集而摒弃与分类无关的属性,进而在聚类过程中只计算属性约简结果集中的属性,从而减少聚类过程的工作量、提高聚类效率.理论分析和实验结果表明,该算法在处理高维数据时较高效.
Fuzzy C-means algorithm is an important clustering analysis algorithm, however, larger amount of its calculation result in its lower efficiency in the case of high dimension data. In order to overcome this problem, a fuzzy C-means clustering algorithm of high-dimensional data based on rough set was proposed. This algorithm introduces attribute reduction of rough set into traditional fuzzy C-means algorithm. It firstly extracts a dataset of important attributes and abandons some unrelated attributes, and then only calculates the important attributes to reduce the workload of the clustering process and improve the efficiency of clustering. Theoretic analysis and experimental results show that this method is more efficient on solving high-dimensional data.