针对线性降维技术应用于具有非线性结构的数据时无法得到令人满意的结果的问题,提出一种新的着重于保持高维空间局部最近邻信息的非线性随机降维算法(NNSE)。该算法首先在高维空间中通过计算样本点之间的欧氏距离找出每个样本点的最近邻点,接着在低维空间中产生一个随机的初始分布;然后通过将低维空间中的样本点不断向其最近邻点的平均位置移动,直到产生稳定的低维嵌入结果。与一种先进的非线性随机降维算法——t分布随机邻域嵌入(t-SNE)相比,NNSE算法得到的低维结果在可视化方面与t-SNE算法相差不大,但通过比较两者的量化指标可以发现,NNSE算法在保持最近邻信息方面上明显优于t-SNE算法。
As linear dimensionality reduction methods usually cannot produce satisfactory low-dimensional embedding when applied to data with nonlinear structure, a new nonlinear dimensionality reduction method named NNSE was proposed to keep the local nearest neighbor information in the high-dimensional space. Firstly, the nearest neighbor points were found by calculating the Euclidean distance between the sample points in the high-dimensional space, then a random initial distribution of the data points was generated in the low-dimensional space. Secondly, by moving the data points towards the mean position of their nearest neighbors found in the high-dimensional space, the data point positions were iteratively optimized until the embedding becomes stable. In the comparison with a state-of-the-art nonlinear stochastic dimensionality reduction method named t-SNE( t-distributed Stochastic Neighbor Embedding), the low-dimensional embedding produced by NNSE method is similar to the visualization produced by the t-SNE method. However, it is shown that the NNSE method is superior to t-SNE in preserving the local nearest neighbor information in the low-dimensional embedding by using a quantitative indicator.