高维分类数据的处理一直是数据挖掘研究所面临的巨大挑战.传统聚类算法主要针对低维连续性数据的聚类。难以处理高维分类属性数据集.本文提出一种处理高维分类数据集的子空间聚类算法(FP—Tree—based SUBspaee clustering algorithm,FPSUB),利用频繁模式树将聚类问题转化为寻找属性值的频繁模式发现问题,得到的频繁模式即候选子空间。然后基于这些子空间进行聚类.针对真实数据集的实验结果表明,FPSUB算法比其他算法具有更高的准确度.
High-dimensional categorical datasets play an important role, so it's significant to cluster these datasets. However, traditional clustering algorithms mainly aim at lower-dimensional continuous datasets, whereas they are difficult to deal with categorical datasets. A new subspace clustering algorithm -FPSUB is proposed. R stores the information of datasets with a FP-Tree framework, which transforms clustering clusters into finding the frequent patterns, and then utilizes them to cluster the objects. The experiment results demonstrate the feasibility and robusmess of this algorithm.