东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种基于密度的分布式聚类算法

期刊名称：南京大学学报，2008，44（5）：536-543
时间：0
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]南京师范大学数学与计算机科学学院,南京210097
相关基金：国家自然科学基金（40771163）
相关项目：面向GML的空间聚类分析与异常检测方法研究

关键词：聚类, 分布式聚类, 基于密度的聚类算法(DBSCAN), 分布式聚类算法(DBDC), clustering, distributed clustering, density-based spatical cIustiny of application with noise（DBSCAN）, density based distributed clusting（DBDC）

中文摘要：

对基于密度的分布式聚类算法DBDC（density based distributed clustering）进行改进,提出了一种基于密度的分布式聚类算法DBDC＊.该算法在局部筛选代表点时结合贝叶斯信息准则BIC,得到少量精准反映局部站点数据分布的BIC核心点,有效降低了分布式聚类过程中的数据通信量,全局聚类时综合考虑了各站点数据的分布情况.实验结果表明,算法DBDC＊的效率优于DBDC,聚类效果好.

英文摘要：

A large number of data are distributed with the application of networks. Distributed clustering is a challenging research topic due to variety of the real-life constrains including bandwidth, the storage of the site memory, etc. An effective density-based distributed clustering algorithm （DBDC ＊） is proposed to improve efficiency of the distributed clustering algorithm （DBDC）. DBDC ＊ , which is combined with the Bayesian Information Criterion, only selecting less BIC_ core_ points to represent each local site, effectively decrease network overload and improves the quality of global clustering. DBDC ＊ is carried out on two different levels, i.e. the local level and the global level. On the local level, all sites carry out a DBSCAN clustering independently from each other. After having completed the clustering, a BIC core points local model is de/ermined. Next the local model is transferred to a central site, where the local models are merged in order to form a global model on the global level by analyzing the local BIC core points. To each local representatives a global cluster-identifier is assigned. This resulting global clustering is broadcasted to all local sites. Then all local models are updated. Experimental results show that the efficiency of the algorithm DBDC ＊ is superior to that of the algorithm DBDC.

同期刊论文项目