空间离群是指空间邻域中属性特征值明显不同于其他对象的空间对象,空间数据离群挖掘能为人们提供很多有趣的信息,但空间数据具有复杂的拓扑关系、方位关系和度量关系等空间特征,传统的面向事务型数据库的离群挖掘算法并不适用于空间数据库。本文提出了基于MST(Minimum Spanning Tree,最小生成树)聚类的空间数据离群挖掘算法(SOM);有机结合了最小生成树理论与密度的方法,既体现了空间离群的局部特性,又体现了空间离群的孤立程度。该算法通过MST维护空间数据的基本空间结构特征,通过打断MST中最不一致的边形成MST聚类,不仅具有密度的聚类方法能够聚集非球状簇和分布不均的数据集的特点,而且聚类结果不依赖于用户参数的选择,因此,离群挖掘结果更合理。最后,通过实例数据,验证了该算法的有效性,它适用于大规模空间数据集的离群挖掘。
A spatial outlier is a spatial object whose non-spatial attribute values are significantly deviated from the other data's in the dataset. How to detect spatial outliers from spatial dataset and to explain the reason causes the anomaly in practical application have become more and more interesting to many researchers. Spatial outliers mining can bring us a lot of interesting information, but for the complicated characteristic of spatial data, such as topological relation, orientation relation, measurement relation, and so on, traditional algorithms for outlier mining in business database seem to deficient in spatial dataset, the main problem lies in the difficulty to maintain spatial structure characteristics for most existing algorithms during the process of outlier mining. Thanks to the similarities between clustering and outlier mining, clustering based outlier mining is an important way to detect anomalies from dataset. However, due to the diversity of clustering algorithms, it is difficult to choose a proper one for outlier mining, and the main purpose of clustering is to find out the principal features of the dataset, outliers are the by-products of clustering. Based on minimum spanning tree clustering, a new algorithm for spatial outlier mining called SOM is proposed. The algorithm keeps basic spatial structure characteristics of spatial objects through the use of geometric structure : Delaunay triangulated irregular network and minimum spanning tree ( MST), and it gains MST clustering by cutting off several most inconsistent edges of MST, so that it not only owns the function that it can acquire clusters from non-spherical and unbalanced datasets as the density-based cluster algorithms does, but also has the advantage that it doesn't depend on user's pre-set parameters, so the clustering result is usually more reasonable. Finally, the validity of SOM algorithm is validated by real application of geochemical soil elements dataset inspected to coastal areas of Fujian province, through analysis it