与在所有特征空间寻找聚类不同,子空间聚类的目标是找到嵌在不同子空间的簇,是实现高维数据聚类的有效途径.传统聚类算法主要采用基于距离测量的方法进行聚类,难以处理高维数据.提出一种能够处理高维数据的子空间聚类算法(Attribute relevancy-based subspace clustering algorithm,ARSUB),将属性转化为频繁模式中的项集,将聚类问题转化为频繁模式挖掘问题,然后基于项目对间强相关的关系建立关系矩阵,以衡量任意两个项集之间的相关度,进而得到强相关的候选子空间.最后利用候选子空间进行聚类得到存在于不同子空间中的簇.在合成数据集与真实数据集的实验结果表明,这种方法具有较高的准确度和效率.
Instead of finding clusters in the full feature space, subspace clustering aims at detecting clusters embedded in different sub- spaces, it is an efficient way to fulfillment high dimensional data's clustering. However,traditional clustering algorithm mainly utilizes distance-based clustering algorithm and it is difficult to deal with high-dimensional data. In the paper, an innovative subspace clustering algorithm that can deal with high dimensional data is proposed. It models the property to unique items and the clustering problem to frequent itemset mining problem. Then we build a relevancy matrix based on the strong correlated item pairs to evaluate the relevance of any two itemsets, and then we get a strong correlated candidate subspace. Ultimately, we utilize the candidate subspace to obtain the clusters existed in different subspaces. Experiments on both synthesis and real datasets demonstrate the feasibility and accuracy of this algorithm.