为了解决现有子空间聚类算法时间复杂度偏高以及对输入参数敏感的问题,提出了一种基于属性聚类方法的高效子空间聚类算法.算法首先通过计算每个属性的基尼值来过滤冗余属性,而后通过基于二维联合基尼值的关系函数建立非冗余属性的关系矩阵,以衡量任意2个非冗余属性的相关度,进而在关系矩阵上应用可产生交叠的聚类算法,聚类结果即为所有兴趣度子空间的候选集合,最后调用聚类算法得到所有存在于这些子空间内的簇.在人工数据集和真实数据集上的实验表明,新算法不仅在时间复杂度和子空间簇的寻找能力方面均有较优表现,而且对输入参数的取值不甚敏感.
Many recently proposed subspace clustering methods suffer from two severe problems: First, the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters. Second, the clustering results are often sensitive to input parameters. A fast algorithm of subspace clustering using attribute clustering is proposed to overcome these limitations. This algorithm first filters out redundant attributes by computing the gini coefficient. To evaluate the correlation of each two non-redundant attributes, the relation matrix of non-redundant attributes is constructed based on the relation function of two dimensional united gini coefficients. After applying overlapping clustering algorithm on relation matrix, the candidate of all interesting subspaces is achieved. Finally, all subspace dusters can be gotten by clustering on interesting subspaces. Experiments on both synthesis and real datasets show that the new algorithm not only achieves a significant gain of runtime and quality to find subspace clusters but also is insensitive to input parameters.