东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

中文文本倾向性分类技术比较研究

ISSN号：1009-8054
期刊名称：信息安全与通信保密
时间：0
页码：785-794
语言：中文
分类：TP391.1[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]上海交通大学信息安全工程学院,上海200240, [2]海军装备上海局综合计划处,上海200240
相关基金：自然科学基金资助项目（批准号60672068）;上海科委科技攻关资助项目（编号：08511501902）.
相关项目：网络舆论发展趋势分析核心技术研究

关键词：中文文本倾向性分类, 停用词表, 特征选择, 特征加权, 朴素贝叶斯, 支持向量机, semantic orientation classification of Chinese text, stop word list, feature selection, feature weighing assignment, na＇f ve Bayesian classifier, support vector machine

中文摘要：

随着近几年互联网的发展，网络评论数量正日益增加。对这些网络评论进行挖掘和分析，识别出其中的情感倾向，可以给用户、企业、政府提供重要的决策支持。采用机器学习方法中的朴素贝叶斯和支持向量机分类模型，根据不同的停用词表、特征选择方法、特征加权方法的组合，对中文文本倾向性分类进行了研究。结论表明，采用保留情感信息相关词性的停用词表，以文档频率为特征选择方法，并应用基于绝对词频的支持向量机分类模型，能取得较好的分类效果。

英文摘要：

With wide spread of the Internet in recent years, the amount of on-line reviews grows fast. Analysis on these on-line reviews and identification of the semantic orientation contained could provide important decision support for customers, enterprises and government organizations. Na＇fve Bayesian classifier in machine learning techniques and support vector machines are adopted for the research of semantic orientation classification of Chinese text with the combination of different stop word list, different feature selection methods and different feature weighing assignment methods The experimental results show that the sentiment orientation classification could obtain high performance by using stop word list-which would remain most part of speech containing semantic information, with document frequency as feature selection method and by applying support vector machines classifier based on term frequency

同期刊论文项目