目前大多数局部离群数据挖掘算法需人为事先设置参数或阈值,且难以应用到高维数据集.给出一种新的局部离群数据挖掘算法PSO-SPLOF,该算法首先将数据集划分为互不相交的子空间,利用偏斜度判断子空间划分的优劣,并采用微粒群算法搜索最优划分子空间集;其次针对每个最优划分子空间,计算其数据对象的局部离群因子SPLOF值,并用SPLOF值来度量数据对象的局部偏离程度.最后采用离散化的天体光谱数据作为数据集,实验验证了PSO-SPLOF算法具有受人为因素影响小、伸缩性强和运算效率高等优点.
Most local outlier mining algorithms depend on the parameters that user inputs,and it is difficult to apply to high-dimensional data set.In this paper,a novel algorithm(PSO-SPLOF) of local outlier mining is presented.Firstly,data set is divided into the disjoint subspaces,merits of the subspace partition is measured by skew of partition,and the best partition of the subspaces is searched by using the optimal particle swarm algorithm.Secondly,the local outlier factor(SPLOF) value of data objects is computed for each subspace in the best partition,and local outliers is measured by its SPLOF value.Finally,experimental results show that the PSO-SPLOF algorithm is not affected by man-made factors,and has strong scalability and high efficiency by taking star spectral data as data set.