P2P(peer-to-peer)网络分布式聚类算法是利用P2P网络上各个节点的计算、存储能力以及网络的带宽,将算法的时间复杂度和空间复杂度平摊到各个节点,使处理和分析海量分布式数据成为可能,从而克服传统基于单个服务器的集中式聚类算法在数据处理能力等方面的限制。提出一种基于节点置信半径的分布式K-means聚类算法,该算法通过计算节点上数据分布的密度,找到同一类数据在节点的稠密和稀疏分布,从而确定聚类置信半径并指导下一步的聚类。实验表明,该算法能够有效地减少迭代次数,节省网络带宽;同时聚类结果也接近集中式聚类算法的结果。
The distributed clustering algorithm over the P2P(peer-to-peer) network can share the time and space complexity equally to each peer with utilizing computing and storage capacitates in them,as well as the bandwidth of the network.It overcomes the limitation of traditional central clustering algorithms in processing distributed data and makes it possible to process and analyze mass distributed data.This paper presented a distributed K-means clustering algorithm based on the confidence radius in local peer.The algorithm calculated the data density in local peer to find the dense and sparse distribution in the same cluster,which was used to deduce the confidence radius to guide the next clustering processing.Experimental results show that the algorithm can effectively reduce the number of iterations and save network bandwidth.Meanwhile,the clustering results in this algorithm are closed to those in the centralized clustering algorithm.