提出了一种采用密度指针的聚类中心初始化方法——density pointer(DP)算法.DP算法以网格单元的几何中心为对称中心,连接该中心与网格单元各顶点,以此对称分割传统的类矩形网格单元,形成超三角形子空间;进而根据各个超三角形子空间与邻居单元相邻的超三角形子空间的密度差异确定密度指针的方向,并根据密度指针计算出每个密集网格单元的聚集因子;最后将具有较大局部聚集因子的网格单元族的重心作为初始聚类中心.在公开数据集和人工数据集上的实验结果表明,DP算法能快速高效地找到接近于真实聚类中心的数据点作为初始聚类中心.针对算法的效率实验表明,DP算法的时间开销与数据集实例数、维度以及网格单元数量均呈一阶线性关系.
A new algorithm using density pointer is proposed to initialize cluster centers. The density pointer (DP) algorithm takes the geometric centers of grid cells as symmetrical centers. With the interconnections between these centers and the vertices of grid cells, DP partitions traditional rectangular-like grid cells into hyper triangle-like subspaces. The density differences between hyper-triangle subspaces and the corresponding subspaces of their neighborhoods are considered to define density pointers. After that, DP will detect density pointers to calculate the aggregation factors of dense cells and then takes the gravity centers of the cells with larger local aggregation factors as initial cluster centers. Experiments on both public and real datasets show that DP is helpful to find cluster centers near to real centers quickly and effectively. Moreover, the running time of DP is linear with respect to the number of instances, the number of grid cells and the dimensions.