提出了一种改进的基于最小最大原则的k-means文档聚类初始值选择算法。该方法首先构造相似度矩阵,然后利用最小最大原则对相似度矩阵进行分析,从而选择初始聚点并自动确定聚类k值。实验结果表明利用该方法找到的k值比较接近真实值。
In this paper a novel algorithm of choosing initial values for k-means document clustering is proposed, which is based on an adapted minimum maximum principle. Firstly similarity matrix is constructed, and then an adapted minimum maximum principle is used to select both the initial seeds and the value of k. The experiment results show that the value of k found by this method is very near to the true value.