聚合层次化聚类是聚类分析中发现数据集潜在结构的一类重要方法.在这类方法中,影响聚类质量的一个关键问题是如何度量子类之间的距离.作为子类间的相似性度量,不但可以通过非参数估计的方式进行计算,还可有效地利用子类数据集中所有样本提供的信息,对子类中数据分布的描述更加充分.实验结果显示,在两种具有代表性的人造数据集上,基于Renyi熵的类间距离度量比3种传统度量方法有更好的层次化聚类效果.并且,在图像过分割的情况下,通过Renyi熵距离对子分割区域进行合并可以找到合理的分割目标.
For agglomerative hierarchical clustering. We propose to measure the similarity between two clusters based on Renyi' s "cross" entropy defined in Information Theoretic Learning. The similarity value is calculated through non-parametric estimation and the measure considers all the samples in the clusters. Experimental results show that for two typical artificial data sets, the similarity measure based on Renyi's entropy has better performance on agglomerative clustering compared with three traditional measure methods and it is also useful to find target object in image oversegmentation.