为解决现有高维海量数据离群点挖掘在时间与空间效率上的不足,提出了一种基于粗约简和网格的离群点检测算法RRGOD。算法在基于密度的离群点检测算法LOF的基础上,结合粗糙集理论特点,引入属性权值概念,淘汰属性权值低于重要度阈值的属性降低维度,从而减少了进行聚类的计算量。在网格聚类阶段,对传统的网格划分方法进行改进,引入属性维半径向量概念,提出了一种可变网格划分方法,根据数据集特点自适应地划分网格空间。在真实数据集和仿真数据集上进行了实验。结果表明,该算法在进行离群点检测时能在保持足够精确度的同时,检测效率有显著的改善。
In order to solve the existing insufficiency of mining outliers in time and space efficiency in high dimensional and massive data, this paper proposes a grid based on rough reduction and outlier detection algorithms RRGOD. Based on the density-based outlier detection algorithm LOF, it combines the characteristics of rough set theory, introduces the concept of the value of property rights, and reduces dimensions by eliminating the values of property right below the threshold,thereby reducing the amount of calculation clustering. In the grid clustering stage, the traditional meshing method is improved,introduces the concept of property dimensional radius vector, and a variable meshing method is presented. Meshing space is divided adaptively according to the characteristics of the data set. Experiment is done on real data sets and simulation data sets. The results show that during outlier detection the algorithm can maintain sufficient accuracy while a significant detecting efficiency is improved.