主成份分析对高维数据进行维数约简可有效提高聚类算法的性能,但这种方法容易丢失部分对聚类具有贡献的成份.为在维数约简的同时保留对聚类具有贡献的成份,提出一种维数约简与聚类交互进行的迭代算法.每次迭代可表示为约束优化问题,并可求解此优化问题的解析解,进而给出相应的迭代聚类算法,称之为基于约束主成份分析的本文聚类.在Reuter21578、WebKB文档集上的实验结果表明,文中方法与k-均值聚类、非负矩阵分解聚类和谱聚类相比具有较好的性能.
Principal component analysis is an effective method to improve the performance of clustering in high dimension. On the other hand, principal component analysis is easy to lose the components which benefits for clustering. In order to preserve these beneficial components, an iteration algorithm of dimensionality reduction and clustering, named constrained principal component clustering, is proposed. Each iteration step can be represented as a constrained optimization problem which has a analytical solution. This iterative clustering algorithm is called document clustering based on constrained principal component analysis. The experimental results on Reuter21578 and WebKB show that the algorithm outperforms to k-means, Non-Negative Matrix Decomposition and Spectral Clustering.