针对局部线性嵌入算法(LLE)应用于非监督机器学习中的缺陷,将该算法与半监督思想相结合,提出了一种基于半监督局部线性嵌入算法的文本分类方法。通过使用文本数据的流形结构和少量的标签样本,将LLE中的距离矩阵采用分段形式进行调整;使用调整后的矩阵进行线性重建从而实现数据降维;针对半监督LLE中使用欧氏距离的缺点,采用高斯核函数将欧氏距离进行变换,并用新的核距离取代欧氏距离,提出了基于核的半监督局部线性嵌入算法;最后通过仿真实验验证了改进算法的有效性。
In order to solve the defects of local linear embedding algorithm ( LLE ) could only be used in unsupervised machine learning, combined this algorithm and the thinking of semi-supervised learning together, this paper proposed a method based on semi-supervised local linear embedding algorithm for text classification. Firstly, with the manifold structure of text data and some labeled samples, this algorithm revised the distance matrix in LLE algorithm by using piecewise function. Secondly, in order to achieve the purpose of dimensionality reduction, reconstructed the samples linearly by using the adjusted matrix. Then, because of shortcomings of the Euclidean distance in semi-supervised local linear embedding algorithm, improved it by proposing kernel based semi-supervised local linear embedding algorithm, which transformed and replaced Euclidean distance by Gaussian kernel function distance. Finally, the results of simulated experiments indicate these algorithms can really promote the performance of text classification.