针对传统距离度量在高维数据上效果不明显问题,提出一种共享最近邻子空间聚类算法(SNN_SC),按照维把数据集转变为多个最近邻事务数据库,挖掘事务数据库中最大共现对象集,即一维上聚类。在一维聚类集上进一步挖掘闭频繁项集,包含闭频繁项集的维是子空间,闭频繁项集是子空间上聚类。实验对比结果表明,SNN_SC能够更准确定位子空间,并在子空间上产生完整聚类。
According to the measurement results in high dimensional data is not obvious problems of the traditional distance,proposes a shared nearest neighbor subspace clustering algorithm(SNN_SC),according to the dimension of the data set into multiple nearest neighbor transaction database mining in transaction database maximum co-occurrence object set,namely dimension clustering.On the one dimensional clustering set,the closed frequent itemsets are further exploited.The dimension of the closed frequent itemsets is a subspace.The experimental results show that SNN_SC can more accurately locate the subspace,and generate a complete clustering in subspace.