为满足大规模空间数据库的聚类需求,面向计算机集群,提出一种基于密度的并行聚类算法。该算法根据数据库分布特征进行数据分区,在每一个节点上对数据块并行聚类,在主节点上合并聚类结果。实验结果表明,该算法的计算速度随着节点数的增多呈线性增加,具有较好的延展性。
In order to meet the demands for large scale databases clustering, this paper proposes a parallel clustering algorithm based on density for computer colony. This algorithm goes on data partition according to database distribution feature, processes data block parallel clustering on every node, merges clustering result on main node. Experimental result shows that computing speed of this algorithm is linear increment with number of node increasing, and it has better extensibility.