东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种优化的基于网格的聚类算法

期刊名称：刘俊岭、孙焕良、王大玲、牛志成，一种优化的基于网格的聚类算法，小型微型计算机系统，27(10)，19
时间：0
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]沈阳建筑大学计算中心,辽宁沈阳110168, [2]沈阳建筑大学信息与控制工程学院,辽宁沈阳110168, [3]东北大学信息科学与工程学院,辽宁沈阳110004
相关基金：国家自然科学基金项目（60573090）资助;辽宁自然科学基金项目（20052006）资助;辽宁省教育厅攻关计划基金项目（05L354）资助.
相关项目：面向新一代搜索引擎的用户动机推演模型的研究

关键词：数据挖掘, 聚类分析, CD—Tree, 基于网格的算法, data mining , clustering analysis , CD-Tree , the cell-based algorithm

中文摘要：

聚类是数据挖掘领域中一个重要的研究课题．与其它算法相比，基于网格的聚类算法可以高效处理低维的海量数据．然而，由于划分的单元数与数据的维数呈指数增长，因此对于维数较高的数据集，生成的单元数过多，导致算法的效率较低．本文基于CD—Tree设计了新的基于网格的聚类算法，该算法的效率远高于传统的基于网格聚类算法的效率．此外，本文设计了一种剪枝优化策略，以提高算法的效率．实验表明，与传统的聚类算法相比，基于CD-Tree的聚类算法在数据集的大小及维度的可伸缩性方面均有显著提高．

英文摘要：

In data mining fields, clustering is an important issue. Comparing with other algorithms, the cell-based clustering algorithms can be applied to low dimensional data. However, in the cell-based algorithms, the number of ceils will increase exponentially with the dimensionality. So it is low efficient with high dimensionality due to a large number of cells. This paper proposes a new clustering algorithm based on CD-Tree, which improve largely the efficiency of the cell-based algorithm. In addition, to improve the efficiency of the algorithm further, we design the pruning strategy that prunes the non-dense cells before the clustering procedure. Extensive experiments on real and synthetic datasets also show that the algorithm has better scalability than other cell-based clustering algorithms.

同期刊论文项目