基于距离和基于密度的离群点检测算法受到维度和数据量伸缩性的挑战,而空间数据的自相关性和异质性决定了以属性相互独立和分类属性的基于信息理论的离群点检测算法也难以适应空间离群点检测,因此提出了基于全息熵的混合属性空间离群点检测算法。算法利用区域标志属性进行区域划分,在区域内利用空间关系确定空间邻域,并用R‘.树进行检索。在此基础上提出了基于全息熵的空间离群度的度量方法和空间离群点挖掘算法,有效解决了混合属性的离群度的度量和离群点的挖掘问题。由于实现区域划分有利于并行计算,从而可适应大数据量的计算。理论和实验证明,所提算法在计算效率和实验结果的可解释性方面均具有优势。
The outlier detection algorithms based on distance and density are faced with the challenges of both the dimensions and the amount of data scalability, and the autocorrelation and heterogeneity of spatial data determines that outlier detection al- gorithm which is characterized by attribute independent of each other and categorical attributes based on information theory is difficult to adapt to the spatial outlier detection. Hence, this paper proposed a spatial outlier detection algorithm based on mixed attributes of holographic entropy. The algorithm partitioned the region by regional identity property, determined the spa- tial neighborhood using spatial relationships in the region and then retrieved it by R* -tree. On this basis, it proposed spatial outlier degree based on holographic entropy and spatial outlier mining algorithm; it solved the outlier degree of the mixed at-- tributes and the problems of outliers mining effectively. It could adapt to the large volume of data calculation because partitio- ning the region was conducive to parallel computing. Theoretical and experimental results show that the algorithm proposed has advantage in terms of the computational efficiency and the interpretative aspects.