文本聚类在很多领域都有广泛的应用,传统的文本聚类方法由于并不考虑语义因素,得出的聚类效果并不理想。利用语义对VSM模型进行变换,即基于语义对VSM模型的各维进行扭曲,将原本的正交坐标系基于语义变换为斜角坐标系,然后将文本的特征向量映射到变换后的VSM模型上再进行聚类,相对减小语义相关的特征向量间的语义距离,从而提高了文本聚类的召回率与查准率,并使得聚类的结果更加语义化。
Text clustering is widely applied in many fields.However,traditional methods of text clustering do not consider the semantic factors;consequently,their clustering effect is not satisfactory.In this paper,we use semantics to transform VSM model,i.e.to distort each dimension of VSM model based on semantics,to transform original orthogonal coordinate system into oblique coordinate system based on semantics,and then to map the eigenvectors of the text onto the transformed VSM model.The clustering will be conducted after these have been done.This clustering method can relatively diminish semantic distances between the eigenvectors which are semantically relevant,therefore can raise the recall rate and precision rate of the text clustering,and make the clustering results more semantic.