子空间聚类是一种将搜索局部化在相关维上进行的聚类算法,它能有效地克服数据因维度过高引起的在全空间上聚类的困难。针对高维分类型数据,本文提出了一种自底向上的子空间层次聚类算法,该算法在全局范围内建立一个最相似线性表用来记录每个簇类与其最相似的簇类的相似度,在聚类过程中,选取最相似的簇类合并,并通过维护此线性表产生最相似的簇类。此算法在基于信息熵的意义上能够较准确地搜索簇类的子空间。通过Zoo和Soybean两个典型的分类型数据实验发现,相对于其它相关聚类算法,该算法在聚类的准确率和稳定性方面表现出较高的优越性。
Subspace clustering is a kind of clustering algorithm which searches information within the scope of local related dimensions. It can overcome the difficulties caused by high-dimensional data set. In this paper, a hierarchical subspace clustering algorithm with the structure of button-up for high-dimensional categorical data is proposed. This algorithm creates the most similar linear list (MSLL)to record the similarity between cluster and its most similar cluster. In the process of clustering, the two clusters have the maximum similarity are merged. The information of most similarity cluster is stored in MSLL. This algorithm can search the subspace of clusters precisely based on information entropy. The experiment on the data sets of Zoo and Soybean show excellent nature on precision and stability compared to other related algorithms.