基于密度的聚类算法DBSCAN是一种有效的空间聚类算法,它能够发现任意形状的聚类并且有效地处理噪声。然而,DBSCAN算法也有一些缺点,例如,①在聚类时只考虑空间属性没有考虑非空间属性;②在对大规模空间数据库进行聚类分析时需要较大的内存支持和//O消耗。为此,在分析DBSCAN算法不足的基础上,提出了一种改进的基于密度的抽样聚类(improved density-based spatial clustering algorithm with sampling,IDBSCAS)算法,使之能够有效地处理大规模空间数据库,并且它不仅考虑了空间属性也考虑了非空间属性。2维空间数据的测试结果表明,该算法是可行、有效的。
DBSCAN is one of the effective spatial clustering algorithms, which can discover clusters of any arbitrary shape and handle the noise effectively. However, it has also several disadvantages. First, it is based on only spatial attributes without considering non-spatial attributes in the databases. Second, when DBSCAN handles large-scale spatial databases, it requires large volume of memory support and I/O cost. In this paper, an improved density-based spatial clustering algorithm with sampling(IDBSCAS) is developed, which not only clusters large-scale spatial databases effectively, but also considers spatial attributes and non-spatial attributes. Experimental results of 2-D spatial datasets show that the new algorithm is feasible and efficient.