Density-based algorithm for discovering clusters in large spatial databases with noise(DBSCAN) is a classic kind of density-based spatial clustering algorithm and is widely applied in several aspects due to good performance in capturing arbitrary shapes and detecting outliers. However, in practice, datasets are always too massive to fit the serial DBSCAN. And a new parallel algorithm-Parallel DBSCAN(PDBSCAN) was proposed to solve the problem which DBSCAN faced. The proposed parallel algorithm bases on MapReduce mechanism. The usage of parallel mechanism in the algorithm focuses on region query and candidate queue processing which needed substantive computation resources. As a result, PDBSCAN is scalable for large-scale dataset clustering and is extremely suitable for applications in E-Commence, especially for recommendation.
Density-based algorithm for discovering clusters in large spatial databases with noise(DBSCAN) is a classic kind of densitybased spatial clustering algorithm and is widely applied in several aspects due to good performance in capturing arbitrary shapes and detecting outiiers. However, in practice, datasets are always too massive to fit the serial DBSCAN. And a new parallel algorithm--Parallel DBSCAN (PDBSCAN) was proposed to solve the problem which DBSCAN faced. The proposed parallel algorithm bases on MapReduce mechanism. The usage of parallel mechanism in the algorithm focuses on region query and candidate queue processing which needed substantive computation resources. As a result, PDBSCAN is scalable for large-scale dataset clustering and is extremely suitable for applications in E-Commence, especially for recommendation.