提出一种适用于分布式数据流环境的、基于密度网格的聚类算法。利用局部站点快速更新数据流信息,使网格空间反映当前数据流的变化。中心站点负责在接收及合并局部网格结构后,对全局网格结构进行密度网格聚类以及噪声网格优化,形成全局聚类结果。实验结果表明,该算法能减少网络通信量,提高全局聚类精度。
A density grid-based clustering algorithm is proposed, which is suitable for the distributed data stream environment. This algorithm updates the data streams quickly and reflects the change of data streams by grid space in local sites. Center site is responsible for collecting and merging the grid structures of all local sites. Then algorithm clusters and optimizes on the global grid structure to generate the global clustering pattern. Experimental results show that the algorithm can reduce network traffic and achieve higher global clustering qualities.