提出基于非负矩阵分解(NMF)的中文文本主题分类方法,应用NMF算法分解词.文本矩阵获取词之间的相关性,有效地解决同义词、多义词的影响。实验结果表明,与基于奇异值分解的潜在语义索引方法相比,该方法计算速度快、占用存储空间较少。在潜在语义数据降低较大的情况下,NMF方法具有更好的分类精度。
This paper presents a method based on Non-negative Matrix Factorization(NMF) for Chinese document topic classification. According to NME the term-document matrix is decomposed to reveal the relationship between terms. This method solves the problem of synonym and polysemy effectively. Compared with Latent Semantic Indexing(LSl) based on Singular Value Decomposition(SVD), experimental results show that this method has faster computing speed and less memory occupancy. It can improve classification precision when the number of latent semantic index is reduced pronouncedly.