位置:成果数据库 > 期刊 > 期刊详情页
一种基于密度的分布式聚类算法
  • 期刊名称:南京大学学报,2008,44(5):536-543
  • 时间:0
  • 分类:TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
  • 作者机构:[1]南京师范大学数学与计算机科学学院,南京210097
  • 相关基金:国家自然科学基金(40771163)
  • 相关项目:面向GML的空间聚类分析与异常检测方法研究
中文摘要:

对基于密度的分布式聚类算法DBDC(density based distributed clustering)进行改进,提出了一种基于密度的分布式聚类算法DBDC*.该算法在局部筛选代表点时结合贝叶斯信息准则BIC,得到少量精准反映局部站点数据分布的BIC核心点,有效降低了分布式聚类过程中的数据通信量,全局聚类时综合考虑了各站点数据的分布情况.实验结果表明,算法DBDC*的效率优于DBDC,聚类效果好.

英文摘要:

A large number of data are distributed with the application of networks. Distributed clustering is a challenging research topic due to variety of the real-life constrains including bandwidth, the storage of the site memory, etc. An effective density-based distributed clustering algorithm (DBDC * ) is proposed to improve efficiency of the distributed clustering algorithm (DBDC). DBDC * , which is combined with the Bayesian Information Criterion, only selecting less BIC_ core_ points to represent each local site, effectively decrease network overload and improves the quality of global clustering. DBDC * is carried out on two different levels, i.e. the local level and the global level. On the local level, all sites carry out a DBSCAN clustering independently from each other. After having completed the clustering, a BIC core points local model is de/ermined. Next the local model is transferred to a central site, where the local models are merged in order to form a global model on the global level by analyzing the local BIC core points. To each local representatives a global cluster-identifier is assigned. This resulting global clustering is broadcasted to all local sites. Then all local models are updated. Experimental results show that the efficiency of the algorithm DBDC * is superior to that of the algorithm DBDC.

同期刊论文项目
同项目期刊论文