东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于DF与LSA相结合的降维法的文本分类系统的研究

ISSN号：1674-4578
期刊名称：《山西电子技术》
时间：0
分类：TP393[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]昆明理工大学信息与自动化学院,云南昆明650051
相关基金：本项目受国家自然科学资金项目（60663004）资助

关键词：文本分类, 潜在语义分析, 文档频率法, 支持向量机, text classification, latent semantic analysis, document frequency, SVM

中文摘要：

介绍了中文文本分类系统的原理，在特征提取上采用了文档频率法（DF）与潜在语义分析法（K认）相结合的方法，先采用DF法过滤掉DF值低的词条，降低文本矩阵的稀疏性，然后使用LSA法进行词语间的语义分析，消除同义词和多义词的影响，提高文本分类的速度与精确度。实验结果表明使用此种降维方法取得了良好的效果。

英文摘要：

This paper introduces the principle of Chinese text classification systems. The combined method of document frequency （DF） and latent semantic analysis （LSA） is used in the feature extraction. Firstly, the DF method is used to filter out low-value terms and to reduce the sparse matrix of text, then the LSA method is used to analyze sernanteme among the words and to eliminate the influence of synonyms and polysemous words, the combined method raises the speed and accuracy of text classification. The experimental results show that the proposed method for text classification is promising.

同期刊论文项目