东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种高维分类型数据的子空间聚类算法

ISSN号：1001-4217
期刊名称：汕头大学学报(自然科学版)
时间：2014.8
页码：51-59
分类：TP391.4[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]汕头大学工学院,广东汕头515063
相关基金：国家自然科学基金资助项目（61170130）
相关项目：高维混合型数据聚类及应用研究

关键词：子空间, 聚类, 高维, 信息熵, subspace, clustering, high-dimensional, information entropy

中文摘要：

子空间聚类是一种将搜索局部化在相关维上进行的聚类算法，它能有效地克服数据因维度过高引起的在全空间上聚类的困难。针对高维分类型数据，本文提出了一种自底向上的子空间层次聚类算法，该算法在全局范围内建立一个最相似线性表用来记录每个簇类与其最相似的簇类的相似度，在聚类过程中，选取最相似的簇类合并，并通过维护此线性表产生最相似的簇类。此算法在基于信息熵的意义上能够较准确地搜索簇类的子空间。通过Zoo和Soybean两个典型的分类型数据实验发现，相对于其它相关聚类算法，该算法在聚类的准确率和稳定性方面表现出较高的优越性。

英文摘要：

Subspace clustering is a kind of clustering algorithm which searches information within the scope of local related dimensions. It can overcome the difficulties caused by high-dimensional data set. In this paper, a hierarchical subspace clustering algorithm with the structure of button-up for high-dimensional categorical data is proposed. This algorithm creates the most similar linear list （MSLL）to record the similarity between cluster and its most similar cluster. In the process of clustering, the two clusters have the maximum similarity are merged. The information of most similarity cluster is stored in MSLL. This algorithm can search the subspace of clusters precisely based on information entropy. The experiment on the data sets of Zoo and Soybean show excellent nature on precision and stability compared to other related algorithms.

同期刊论文项目