提出一种适用于大型数据集的分布式聚类算法。该算法以传统的K-means算法为基础进行合理的改进,使之更适用于分布式环境,并从算法的复杂度分析,将该算法与传统的集中式K-means算法及其他分布式算法进行比较。实验表明,该算法在保持了集中式K-means算法所有必要特性的同时,提高了数据处理速度。
A distributed clustering algorithm suit for large data sets is presented.This algorithm is a modified version of the common K-means algorithm with suitable change for making it executable in distributed environment.The algorithm,the traditional serial K-means algorithm and other existing algorithms are compared on the basis of analysing the complexity of the algorithm.Experimental results show that this distributed algorithm maintains all necessary characteristics of the serial K-means algorithm,as well improves the speed of data processing.