稀疏子空间聚类(Sparse subspace clustering,SSC)是一种基于谱聚类的数据聚类框架.高维数据通常分布于若干个低维子空间的并上,因此高维数据在适当字典下的表示具有稀疏性.稀疏子空间聚类利用高维数据的稀疏表示系数构造相似度矩阵,然后利用谱聚类方法得到数据的子空间聚类结果.其核心是设计能够揭示高维数据真实子空间结构的表示模型,使得到的表示系数及由此构造的相似度矩阵有助于精确的子空间聚类.稀疏子空间聚类在机器学习、计算机视觉、图像处理和模式识别等领域已经得到了广泛的研究和应用,但仍有很大的发展空间.本文对已有稀疏子空间聚类方法的模型、算法和应用等方面进行详细阐述,并分析存在的不足,指出进一步研究的方向.
Sparse subspace clustering (SSC) is a newly developed spectral clustering-based framework for data clustering. High-dimensional data usually lie in a union of several low-dimensional subspaces, which allows sparse representation of high-dimensional data with an appropriate dictionary. Sparse subspace clustering methods pursue a sparse representation of high-dimensional data and use it to build the affinity matrix. The subspace clustering result of the data is finally obtained by means of spectral clustering. The key to sparse subspace clustering is to design a good representation model which can reveal the real subspace structure of high-dimensional data. More importantly, the obtained representation coefficient and the affinity matrix are more beneficial to accurate subspace clustering. Sparse subspace clustering has been successfully applied to different research fields, including machine learning, computer vision, image processing, system identification and others, but there is still a vast space to develop. In this paper, the fundamental models, algorithms and applications of sparse subspace clustering are reviewed in detail. Limitations existing in available methods are analyzed. Problems for further research on sparse subspace clustering are discussed.