东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于改进TF-IDF算法的文本分类方法研究

ISSN号：1000-3290
期刊名称：《物理学报》
时间：0
分类：TP393[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：广东工业大学信息工程学院,广东广州510006
相关基金：国家自然科学基金资助项目（11204043）

关键词：提取, 特征选择, 文本分类, 预处理, keyword extraction , feature selection , text classification , pretreatment

中文摘要：

类别关键词是文本分类首先要解决的关键问题,在研究利用类别关键词及TF-IDF算法对文本进行分类的基础上,提出了一种改进的TF-IDF算法.首先建立类别关键词库,并对其进行扩充及去重,克服了向量空间模型不能很好调节权重的缺点.通过加入文档长度权值修正文档中关键词的权重,有效地解决了原有特征词条类别区分能力不足的问题.采用贝叶斯分类方法,结合实验验证了该算法的有效性,提高了文本分类的准确度.

英文摘要：

Establishing category keywords is the key problem in text classification, which should be solved first. On the basis of the classification of text by using the category keywords and TF-IDF algo-rithm, an improved TF-IDF algorithm has been proposed to overcome the shortcomings of the vector space model, which cannot well adjust the weights. Firstly, category keyword library should be established, and the expansion and duplication be carried out. The weight of keywords in the document is modified by the addition of the length of the document, and the shortage of the original features of the entry class dis-tinction ability is solved effectively. By using Bayesian classification method, combined with the experi-ments, the effectiveness of the algorithm is verified, and the accuracy of text classification improved.

同期刊论文项目

高功率固体激光系统光束匀滑及其时空特性研究

期刊论文 5 会议论文 1

同项目期刊论文

Performance of target irradiation in a high-power laser with a continuous phase plate and spectral d

基于简单透镜列阵的可调焦激光均匀辐照光学系统研究

考虑钕玻璃放大器增益特性的光谱色散匀滑系统性能研究

期刊信息

《物理学报》
北大核心期刊（2011版）

主管单位:中国科学院
主办单位:中国物理学会中国科学院物理研究所
主编：欧阳钟灿
地址：北京603信箱(中国科学院物理研究所)
邮编：100190
邮箱：apsoffice@iphy.ac.cn
电话：010-82649026

国际标准刊号：ISSN：1000-3290
国内统一刊号：ISSN：11-1958/O4
邮发代号:2-425

获奖情况:
1999年首届国家期刊奖,2000年中科院优秀期刊特等奖,2001年科技期刊最高方阵队双高期刊居中国期刊第12位

国内外数据库收录:
美国化学文摘（网络版）,荷兰文摘与引文数据库,美国工程索引,美国科学引文索引（扩展库）,英国科学文摘数据库,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:49876