FastMap是经典多维标度法(classical multidimensional scaling,CMDS)的一种快速算法,它包含一系列的投影.在每次投影中,两个相距较远的点被选为枢轴点,连接枢轴点得到一个枢轴;然后将各样本投影到枢轴上;最后,修改所有样本间的距离.FastMap的不足在于只能得到CMDS的近似解.对FastMap进行了深入分析,指出FastMap的本质就是把各样本投影到由枢轴确定的一组正交向量上.由于这组向量通常不同于样本集的主轴,使得FastMap只能得到CMDS的近似解;并指出FastMap算法的最大投影次数等于样本集的内在维数.在此理论分析的基础上,提出了一种改进的FastMap算法—iFastMap(improved FastMap)算法.通过对FastMap坐标进行主成分分析,iFastMap得到了与CMDS完全一致的解;此外,从样本集中选取一个内在维数等于整个样本集内在维数的子集,将枢轴点的选取限定在这个子集上,并在每次投影后只修改枢轴点与各样本间的距离,iFastMap的速度得到进一步提高.实验验证了iFastMap与CMDS解的完全一致性及其高效性.
Classical multidimensional scaling(CMDS)is a very common method for dimensionality reduction,data visualization,machine learning,pattern recognition,etc.When the distance is Euclidean,the CMDS solution can be viewed as the projections of the samples onto the principal axes of the sample set.The problem of CMDS is that its time increases rapidly with the increase of data size.As one of the fast variants of CMDS,the FastMap algorithm has been widely used in many fields.It includes some recursive projections.And each projection consists of three steps.Firstly,apivot is obtained by passing through two far apart points(referred to as pivot points from now on);then the samples are projected onto the pivot and the coordinates of the samples in a low-dimensional Euclidean space are obtained;and finally,the distances between all samples are modified.The shortcoming of the FastMap algorithm is that it can only find approximate solution of CMDS.In this paper,the FastMap algorithm is analyzed in detail.It is found that the essence of the FastMap algorithm is to project the samples onto a set of mutually orthogonal directions determined by the pivots.Since these directions are usually different from the principal axes of the sample set,the Fast-Map algorithm can only get the approximate solution of CMDS.It is also found that the pivots can be selected from a subset provided that its intrinsic dimension is equal to that of the whole sample set.Last but not least,it is found that only the distances between the pivot points and the samples are used to obtain the FastMap coordinates.That is to say,it's unnecessary to modify the distances between all the samples.Based on the theoretical analysis,an improved algorithm called iFastMap(improved FastMap)is put forward in this paper.By introducing principal component analysis,the FastMap coordinates can be aligned with the CMDS coordinates.As a result,the iFastMap algorithm can find exactly the same solution as that of CMDS.In addition,by selecting a subset with the same