近年来,流形学习成为包括机器学习、模式识别和计算机视觉等相关领域的研究热点.流形学习算法中,邻域选择直接关系到算法的性能,而传统的邻域选择算法如k近邻和ε邻域算法存在参数难以确定,所构建邻域不能反映流形学习算法对邻域要求等缺点.提出了一种基于流形局部线性结构的自适应邻域选择算法(ANSLL).首先通过分析现有流形学习算法,总结出构建邻域的两个基本原则:1)同一邻域的所有点都近似地位于某一d维线性子空间内(d为流形维数);2)每个邻域包含尽可能多的点.基于这两个基本原则,ANSLL算法采用主成分分析技术(PCA)度量有限点集的线性程度,通过邻域压缩或扩张方式自适应地构建邻域.针对邻域线性结构的特点,还提出了一种改进的邻域图构建方法,以提高等度映射(Isomap)算法中测地线距离估计的准确性.最后大量系统的实验表明,ANSLL算法能够依据流形的局部曲率自适应地构建邻域,从而提高大多数流形学习算法(如Isomap和LLE)的性能.
Recently,manifold learning has attracted extensive interests of researchers from machine learning,pattern recognition,computer vision and related communities.Constructing a reasonable neighborhood for each point is a key issue in manifold learning.Conventional neighborhood selection algorithms k nearest neighbors(k-NN) and ε-neighborhood need to specifiy the parameter manually beforehand which most manifold learning algorithms are sensitive to.In this paper,an adaptive neighborhood selection algorithm based on local linearity called ANSLL is presented.Firstly,two principles of construction neighborhood for most existing manifold learning algorithms are summarized as: 1) data points in the same neighborhood should approximately lie on a d-dimensional linear subspace(d is the dimensionality of data manifold) and 2) each neighborhood should be as large as possible.Then based on these two principles,ANSLL exploits principal component analysis(PCA) technique to measure the linearity of finite data points set and constructs neighborhood for each point via neighborhood contraction or expansion.In addition,an improved method of constructing neighborhood graph,which can improve the accuracy of geodesic distance estimation for isometric embedding,is also proposed.Finally,extensive and systematic experiments on both synthetic data sets and real world data sets demonstrate that the ANSLL algorithm can adaptively tune neighborhood size according to local curvature of data manifold and then improve the performance of most existing manifold algorithms,such as Isomap and LLE.