针对目前已有的聚类算法不能很好地处理包含不同密度的簇数据,或者不能很好地区分相邻的密度相差不大的簇的问题,提出1种新的基于严格最近邻居和共享最近邻居的聚类算法.通过构造共享严格最近邻图,使样本点在密度一致的区域保持连接,而在密度不同的相邻区域断开连接,并尽可能去除噪声点和孤立点.该算法可以处理包含有不同密度的簇数据,而且在处理高维数据时具有较低的时间复杂度、实验结果证明,该算法能有效找出不同大小、形状和密度的聚类.
Due to the fact that the current clustering algorithms can not perform well while processing clustering datasets which contain clusters with different densities or distinguishing adjacent clusters with similar densities, a new clustering algorithm is proposed based on strict nearest neighbors and shared nearest neighbors. The algorithm keeps the links in regions of uniform density and breaks the links in regions of different density and removes the noises and isolated points by constructing the shared strict nearest neighbor graph. It processes datasets containing clusters with different densities and has low time complexity while dealing with high dimensional data. The experiment results prove that the algorithm can efficiently find clusters with differing shapes, sizes and densities.