针对基于密度的群以噪声发现聚类算法(density-based spatial clustering of applications withnoise, DBSCAN)的所需内存及I/O消耗大;空间聚类的密度不均匀时,采用全局统一的变量,聚类质量较差;对于输人参数敏感性较高等三个不足进行了改进.首先根据数据的空间分布特性,将整个数据空间划分为多个较小的分区,使分区的局部密度相对更均匀;然后将每个局部分区运用改进的DBSCAN算法进行聚类,改进的算法可以根据空间数据的分布,对一个中心点自适应的选取近邻,并对这些近邻点进行取样、扩展,有效提高了算法的准确性和效率;接着将所得到的聚类结果按照合并规则进行合并.最后通过仿真实验,验证了改进的DBSCAN算法解决了内存消耗过大、聚类质量差及全局参数敏感的问题.
DBSCAN (density-based spatial clustering of applications with noise) algorithm is a kind of spatial clustering algorithms based on density. This algorithm uses the concept of clustering based on density, which requires the contained objects in certain region to, be not less than a given threshold. A significant advantage of DBSCAN algorithm is its fast clustering, and it can effectively deal with noise and find the clustering space of arbitrary shape. However, this algorithm directly operates to the database and uses a global parameter to characterizing density when clustering. Thus, DBSCAN algorithm covers three obvious deficiencies, It requires large memory and I/O and owns poor quality of clustering when using unified global variables and sensitivity to input parameters. This thesis mainly improves these three deficiencies. Firstly, basing on spatial distribution characteristics, this thesis divides the whole data space into subareas to make the local density of subareas relatively more uniform. Secondly, it uses improved DBSCAN algorithm clustering algorithm on each local district. Improved DBSCAN algorithm can select neighbors adaptively according to the distribution of special data and choose samples from thisneighbors and realize extending thus improving the efficiency and accuracy of the clustering. Then it merges the clustering results regularly according to merger rules. Lastly, through simulation experiment the thesis proves that the new algorithm solves the problems such as larger memory consumption, low quality clustering and sensitive global parameters.