不确定数据聚类是传统数据挖掘的扩展,面对不确定数据聚类,研究者们经常把聚类问题描述成组合优化问题,并设计启发式聚类算法进行求解.现有的启发式聚类算法,如UK-means和UK-Medoids具有容易理解和实现简单等优点,但初始解敏感问题严重影响了聚类质量.本文在近似骨架理论的基础上,提出了一种近似骨架启发式聚类算法APPGCU(Approximate backbone guided heuristic clustering algorithm for uncertain data).该算法首先对原数据集完成P次采样,在采样后的规模较小的P个数据集上分别执行UK-Medoids算法得到P个局部最优解;然后通过对P个局部最优解求交得到近似骨架,并从中提取初始簇心;最后从初始簇心开始,启发式搜索出聚类结果.在仿真和实际数据集中的实验结果表明,算法APPGCU的聚类结果明显高于实验对比的启发式聚类算法,提高了聚类质量.
As an extension of traditional data mining,uncertain data clustering gets wide interest of researchers.Uncertain data clustering is equivalent to a combinatorial optimization problem,and researchers usually solve it by using heuristic algorithms.As we knew,the existence heuristic clustering algorithms for uncertain data,such as UK-Means and UK-Medoids,are easy to be understood and to implemented,but these algorithms also have the initialization sensitivity problems which affect the clustering quality severely.In this paper,we propose an approximate backbone based heuristic clustering algorithm APPGCU(Approximate backbone guided heuristic clustering algorithm for uncertain data).In this algorithm,we first do Ptimes sampling on the original dataset,and run UK-Medoids on thePsampled datasets to get Psub-optimal solutions;then get the approximate backbone from the Psub-optimal solutions,and extract the initialization cluster center;eventually,re-run UK-Medoids with the initialization and get the solution by a heuristic search.The experiments on synthetic and standard UCI uncertain datasets demonstrate that APPGCU gets better clustering results than existing heuristic clustering algorithms,and improves the quality of clustering results.