为了改善向量空间模型的稀疏性,提高文本分类的效果,在不引入外部知识的情况下,通过挖掘语料库内部的词间关系和文本间关系,将其以不同的方式融入原始矩阵,形成了4种新的文本表示模型,并通过文本分类实验来验证其表达能力。实验证明,融入词和文本关系能明显改善KNN和SVM的分类效果。
In order to improve the sparsity of the vector space model and text classification performance,without introduction external knowledge,this paper mines the relationships among terms and documents,and integrated the relationship into the original matrix to form 4new text representation models.Experiment results show that the text representation models integrated terms and documents relationships can improve the classification performance of SVM and KNN.