以等宽或随机宽度网格密度单元为基础的高维聚类算法不能保证复杂数据集中的聚类结果的质量。该文在核密度估计和空间统计理论的基础上,给出一种基于局部显著单元的高维聚类算法来处理复杂数据的高维聚类问题。该方法以局部核密度估计和空间统计理论为基础定义了局部显著单元结构来捕获局部数据分布;设计了能快速发现覆盖数据分布的局部显著区域的贪婪算法;对具有相同属性子集的局部显著单元执行Single-linkage算法发现其中的聚类结果。实验结果表明,以局部显著单元为基础的高维聚类算法能够发现复杂数据集中隐含的高质量聚类结果。
High dimensional clustering algorithm based on equal or random width density grid cannot guarantee high quality clustering results in complicated data sets.In this paper,a High dimensional Clustering algorithm based on Local Significant Unit(HC_LSU) is proposed to deal with this problem,based on the kernel estimation and spatial statistical theory.Firstly,a structure,namely Local Significant Unit(LSU) is introduced by local kernel density estimation and spatial statistical test;secondly,a greedy algorithm named Greedy Algorithm for LSU(GA_LSU) is proposed to quickly find out the local significant units in the data set;and eventually,the single-linkage algorithm is run on the local significant units with the same attribute subset to generate the clustering results.Experimental results on 4 synthetic and 6 real world data sets showed that the proposed high-dimensional clustering algorithm,HC_LSU,could effectively find out high quality clustering results from the highly complicated data sets.