大量初始聚类结果之间存在强弱不同的相似性,会给用户理解与描述聚类结果带来不利影响,进而阻碍数据挖掘后续工作的顺利展开.传统聚类算法由于注重聚类形状及空间邻接性,或者考虑全局数据分布密度的均匀性,实际中均难以解决这一类问题.为此,提出了基于分形的聚类层次优化算法FCHO(fractal-based cluster hierarchy optimization),FCHO算法基于多重分形理论,利用聚类对应多重分形维数及聚类合并之后多重分形维数的变化程度来度量初始聚类之间的相似程度,最终生成反映数据自然聚集状态的聚类家族树.此外,初步分析了算法的时空复杂性,基于合成数据集和标准数据集的有关实验工作证实了算法的有效性.
A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. Moreover, there will exist more or less similarities among these large amounts of initial cluster results in real life data set. Accordingly, analyzer may have difficulty to implement further analysis if they know nothing about these similarities. Therefore, it is very valuable to analyze these similarities and construct the hierarchy structures of the initial clusters. The traditional cluster methods are unfit for this cluster post-processing problem for their favor of finding the convex cluster result, impractical hypothesis and multiple scans of the data set. Based on multifractal theory, this paper proposes the FCHO (fractal-based cluster hierarchy optimization) algorithm, which integrates the cluster similarity with cluster shape and cluster distribution to construct the cluster hierarchy tree from the disjoint initial clusters. The elementary time-space complexity of the FCHO algorithm is presented. Several comparative experiments using synthetic and real life data set show the performance and the effectivity of FCHO.