为了提高基于网格技术的聚类精度,提出了利用低密度单元中的点到高密度单元中心的距离作为判断聚类边界点和孤立点的技术,开发了HQGC算法。实验表明,该算法能识别任意形状的聚类,聚类的精度高、运行速度快、可扩展性好。
In order to improve the quality of grid-based clustering, the paper presents a technique of distinguish between outliers and boundary points of clusters, which uses distance fi'om point of a sparse cell to the center of the dense cell as criterion function, and develops HQGC algorithm using this technique. The experimental results show that it can discover arbitrary shapes of clusters, the accuracy of clustering results of HQGC is high, with the merit of only requiring one data scan, HQGC is efficient with its run time being linear to the size of the input data set, and scale well.