针对k均值算法需要用户事先确定聚簇数k、阈值t和聚簇中心Q,提出了一种基于层次的k均值聚类算法(HKMA)。该算法首先采用层次方法对文档进行初始聚类,得到的聚类总数作为k均值算法中的k值,在此基础上,通过k均值聚类对聚类结果进行修正。最后通过实验验证了算法的准确度和时间效率,通过与其他聚类算法的比较,所提出的算法具有更好的性能。
Because it is necessary for users to predefine the number of clusters, the center of a cluster and the initial threshold for k-means clustering algorithm, A k-means clustering algorithm based on hierarchy is presented in this paper. Firstly, this algorithm classifies documents into one or more predefined categories using hierarchical methods, the total classified number is taken for the number of clusters. Secondly, it uses k-means to modify the clustering results. Finally, the experiment results show that the new approach, proposed in this paper, is very effective and efficient when compared with existing hard clustering algorithms like k-means and its variants.