针对现有聚类集成谱算法聚类结果不稳定的问题,引入近邻传播聚类思想,设计了基于近邻传播的聚类集成谱算法(APCESA).该算法先由聚类集成和谱分得到空间结构相对简单的文本低维嵌入,然后通过近邻传播算法得到最终的聚类结果.在谱分解过程中,采用矩阵变换方法,避免了谱算法中特征值分解的高昂计算代价.对真实文本数据集的实验结果表明,所提算法比对比算法聚类更稳定,且聚类结果的NMI值和ANMI值均高于对比算法.
The existing cluster ensemble spectral algorithm are mostly unstable. To solve this problem, an affinity propagation-based cluster ensemble spectral algorithm was proposed, which brings in the idea of affinity propagation clustering. The algorithm utilized cluster ensemble and spectral analysis to achieve the low dimensional embedding of documents, and obtained the final clustering results by using an affinity propagation clustering algorithm. To avoid the high computational cost of eigenvalue decomposition in a spectral algorithm, matrix transformation was used in this paper. Experiments using real-world document sets show that the proposed algorithm is more stable than the compared methods, both NMI and ANMI values of the clustering result are higher than that of the comparison method.