基于Hausdorff距离用于定义两个紧集之间距离的考虑,将区间数视为一个紧集,定义了区间数之间的距离,并研究了区间向量的距离,从而得到聚类分析中两个样品间的距离。进一步定义了两个类之间的Hausdorff距离。为消除量纲对聚类结果的影响,研究了区间数据的标准化。基于此,给出了区间数据系统聚类算法。采用随机模拟的方法,对文中方法进行有效性评价,结论表明,Hausdorff距离法的聚类有效性在所有设计的实验条件下都要优于传统的欧式距离法。最后,基于符号数据分析的思想构造区间数据,给出了对多种动物群体按其身高、体重等生理特征进行聚类分析的算例。
An interval being seen as a compact set, the distance between two interval numbers is defined based on ttausdorff distance which is used to define a distance between two compact sets. Furthermore, the distance between two interval vectors and two clusters were studied. To avoid the impact of different scales of the sample data, the normalization of interval data were studied. Based on this, the hierarchy clustering algorithm of interval data was proposed. A simulation study was conducted to evaluate our method. The results show that the method based on Hausdorff distance presented in the paper performs better than on Euclidean distance under all the situations designed in the simulation. Finally, an example of clustering several types of animals according to their heights and weights is given, where the interval data were achieved by the theory of symbolic data analysis.