传统的基于网格的数据流聚类算法在同一粒度的网格上进行聚类,虽然提高了处理速度,但聚类准确性较低。针对此问题,提出一种新的基于双层网格和密度的数据流聚类算法DBG—Stream。在2种粒度的网格上对数据流进行聚类,并借鉴CluStream算法的思想,将聚类过程分为2个阶段。在线过程中利用粗粒度的网格单元形成初始聚类,离线过程中在细粒度网格单元上,对位于簇边界的网格单元进行二次聚类以提高聚类精度,并实现了关键参数的自动设置,通过删格策略提高算法效率。实验结果表明,DBG-Stream算法的聚类精确度较D—Stream算法有较大提高,有效解决了传统基于网格聚类算法的聚类精度较低的问题。
Traditional data stream clustering algorithm is based on grid clusters at the grid of same granularity, and it improves processing speed, but the accuracy of cluster is lower. In this connection, a new data stream clustering algorithm DBG-Stream based on double-layer grid and density is put forward. The algorithm uses grids of two different granularity to cluster data stream, and by learning the idea of CluStream algorithm, it divides the clustering process into two stages. The first one is that applying coarse-grained grid cells to form the initial cluster in the online process, and the second one is that on the fine-grained grid cells, making secondary clustering for grid cell located on the boundary cluster in the offiine process so as to improve the accttracy of cluster. At the same time, it enables the automatic setting of key parameters. Besides, it improves the efficiency of the algorithm by the strategy of deleting grid. Experimental results show that the DBG-Stream algorithm clustering accuracy greatly improves compared with D-Stream algorithm, and it effectively solves the problem of traditional grid-based clustering algorithm.