针对经典的流形学习算法Isomap在非线性数据稀疏时降维效果下降甚至失效的问题,提出改进的切近邻等距特征映射算法(Cut-Neighbors Isometric feature mapping,CN-Isomap).该算法在数据稀疏的情况下首先通过有效识别样本点的"流形邻居"来剔除近邻图上的"短路"边,然后再通过最短路径算法拟合测地线距离,使得拟合的测地线距离不会偏离流形区域,从而低维嵌入映射能够正确地反映高维输入空间样本点间的内在拓扑特征,很好地发现蕴含在高维空间里的低维流形,有效地对非线性稀疏数据进行降维.通过对Benchmark数据集的实验表明了算法的有效性.CN-Isomap算法是Isomap算法的推广,不仅能有效地对非线性稀疏数据进行降维,同样也适用于数据非稀疏的情况.
An improved algorithm Cut-Neighbors Isometric feature mapping(CN-Isomap) is proposed after analyzing why Isomap,a classic manifold learning algorithm,can not reduce dimensionality effectively for sparse nonlinear data.The algorithm first identifies the 'manifold neighbors' effectively when data is sparse in order to delete the 'short circuit' edges in neighborhood graph.Then it simulates the geodesic distance by shortest path algorithm,so that the geodesic distance will not deviate from the manifold region.Thus low-dimensional embedded mapping can correctly reflect the inherent topological features of sample points in high-dimensional input space,which enables that the algorithm finds the low-dimensional manifolds implicated in high-dimensional space better and reduce dimensionality of sparse nonlinear data effectively.The effectiveness of the algorithm is verified by experiment on Benchmark data set.CN-Isomap is the extension of Isomap.It is not only effective for dimensionality reduction of sparse nonlinear data,but also applicable to non-sparse data.