区间型符号数据是一种重要的符号数据类型,现有文献往往假设区间内的点数据服从均匀分布,导致其应用的局限性。本文基于一般分布的假设,给出了一般分布区间型符号数据的扩展的Hausdorff距离度量,基于此提出了一般分布的区间型符号数据的SOM聚类算法。随机模拟试验的结果表明,基于本文提出的基于扩展的Hausdorff距离度量的SOM聚类算法的有效性优于基于传统Hausdorff距离度量的SOM聚类算法和基于μσ距离度量的SOM聚类算法。最后将文中方法应用于气象数据的聚类分析,示例文中方法的应用步骤与可操作性,并进一步评价文中方法在解决实际问题中的有效性。
Interval data is as an important type of symbolic data. Most of the existed literature assumed that the point data composed the interval are uniformly distributed. This limits the use of the interval symbolic data. Considering this, our study makes a research on the SOM clustering method of interval data with the assumption of general distribution. First, we propose a new extended-Hausdorff distance metric of interval symbolic data. Based on this, the algorithm of SOM clustering of generally-distributed symbolic interval data is presented. Then we perform a simulation experiment evaluation on our method. The results indicate that, compared with the SOM clustering algorithm based on traditional Hausdorff distance and pa distance, the SOM clustering algorithm based on the extended-Hausdorff distance proposed in this paper is more effective in our experiment. Finally, we use our method in a real meteorological data and reveal the superiorities of our method in practical aspect.