提出一种基于半监督K—means的K值全局寻优算法,该算法打破传统方法中采用样本类别作为K值的限定,利用少量标记数据即可指导和规划大量无监督数据.结合数据集自身的分布特点及聚类后各个簇内的监督信息,根据投票方法来指导簇中数据集的类别标记.实验表明,本文所提出的方法可以有效的寻找适合数据集的最佳K值和聚类的中心,提高聚类性能.
In this paper, we propose a global optimising K value for semi-supervised K-means algorithm. It has broken the limits that traditional methods have in selecting samples as the K value. It can direct and plan a great amount of supervision data by using only a small amount of labled data. Combining the distribution characteristics of data sets and monitoring information in each cluster after clustering, we use the voting rule to guide the cluster labeling in the data sets. The experiments show that the method proposed in this paper can effectively find the best data sets for K values and clustering center and enhancing the performance of clustering.