针对常用的降维算法只能可视化高维数据某一方面或某几方面特征的不足,运用参数嵌入算法在低维空间可视化其所有的特征。首先假设数据的类别在嵌入空间服从高斯分布,通过支持向量机分类算法获得训练数据在高维空间类别属性的后验概率,然后运用参数嵌入算法得到测试数据在嵌入空间的坐标和类别属性,从而实现高维数据分类的可视化.在Usps,YaleB,Mini Newsgroups数据集上的实验结果表明,该方法不仅能在低维空间表示高维数据与所属类别的联系,而且能正确揭示出数据集内部和类别集内部的关系,有利于深入研究高维数据分类的性质,参数嵌入算法的可视化效果优于直接应用PCA,ISOMAP等算法得到的低维图形,并且计算复杂度仅是数据类别和相应个数的乘积,非常适合于数据量大,类别数较少的数据分类可视化。
In order to solve the problem that only one or several aspects of high-dimensional data characteristics are reflected when using general dimensionality reduction methods, Parametric Embedding(PE) algorithms were applied to embedding all the characteristics in a low-dimensional visualization space. Firstly, Posterior probability of training data over classes in high-dimensional space was obtained by Support Vector Machines (SVM) classification algorithms based on the assumption that the class of data can be represented by a Gaussian distribution in the embedding space. Then, PE was applied to obtain the test data coordinates and class properties in the embedding space, thus implementing high-dimensional data classifier visualization. Experimental results on USPS, YaleB, MiniNewsgroups data sets show the method can represent not only the relations between data and classes, but also the relations within the set of data and within the set of classes, which is beneficial to deeply study on high-dimensional data classifying properties. The visualization effects of PE are better than those by directly applying PCA, ISOMAP algorithms. In addition, the PE computing complexity is just the product of the number of objects and the number of classes, which is suitable to large volume of data, fewer types of classified data visualization.