提出度量多个集合之间总体差异程度的拓展集合差异度及相关定理,并给出一种新的解决分类属性高维数据聚类问题的CAESD算法。基于拓展集合差异度及拓展集合特征向量,在CABOSFV_C聚类的基础上通过两阶段聚类完成全部聚类过程。采用UCI数据集与K-modes及其改进算法、CABOSFV_C算法进行比较实验,结果表明CAESD算法具有较高的聚类正确率。
This paper proposed extended set dissimilarity and related theory to measure the general dissimilarity among sets,and proposed a new algorithm to cluster high dimensional data named as clustering algorithm based on extended set dissimilarity for categorical attributes(CAESD),which executed two steps clustering process using extended set dissimilarity and extended set feature vector on the basis of CABOSFV_C algorithm.Comparative tests using UCI data sets show that CAESD algorithm has higher clustering accuracy than K-modes algorithm,improved approaches of K-modes and CABOSFV_C algorithm.