东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于情感关键句抽取的情感分类研究

ISSN号：1000-1239
期刊名称：计算机研究与发展
时间：2012.11.11
页码：2376-2382
分类：TP391.1[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国科学院计算技术研究所,北京100190, [2]中国科学院大学,北京100049
相关基金：基金项目：国家自然科学基金重点项目（60933005）;国家“八六三”高技术研究发展计划基金项目（2010AA012500）;国家自然科学基金项目（60803085）
相关项目：Web搜索与挖掘的新理论和新方法—支持舆情监控的Web搜索与挖掘的理论与方法研究

关键词：情感分类, 关键句, 分类器融合, 联合训练, 有监督学习, 半监督学习, sentiment classification, key sentence, classifier combination, co waining, supervisedlearning, semi-supervised learning

中文摘要：

情感分析需要解决的一个重要问题是判断一篇文档的极性是正面的还是负面的．情感分类的正确率很难达到普通文本分类的水平，因为情感分类更难更复杂．在判断文档的情感极性时，不同的句子具有不同的情感贡献度，所以，对整篇文档的关键句和细节句进行区分将有助于提高情感分类的性能．关键句通常简短且具有判别性，而细节描述句通常复杂多样且容易引入歧义．在关键句抽取算法中，考虑3类属性：情感属性、位置属性和关键词属性．为了更好地利用关键句和细节句之间的差异性和互补性，将抽取的关键句分别用于有监督的和半监督的情感分类．在有监督情感分类中，采用的是分类器融合的方法；在半监督情感分类中，采用的是Cotraining算法．在8个领域上进行实验，结果表明所提方法性能明显优于Baseline，从而证明情感关键句抽取算法是有效的．

英文摘要：

A key problem of sentiment analysis is to determine the polarity of a review is positive （thumbs up） or negative （thumbs down）. Unlike topic-based text classification, where a high accuracy can be achieved, the sentiment classification is a hard and complicated task. One of the main challenges for document-level sentiment classification is that not every part of the document is equally informative for inferring the polarity of the whole document. Thus, makinga distinction between key sentences and trivial sentences will be helpful to improve the sentiment classification performance. Wc divide a document into key sentences and detailed sentences. Key sentence is usually brief but discriminative while detailed sentences are diverse and ambiguous. For key sentence extraction, our approach takes three attributes into account： sentiment attribute, position attribute and special words attribute. To make use of the discrepancy and complementarity of key sentences and detailed sentences, we incorporate key sentences and detailed sentences in supervised and semi supervised learning. In supervised sentiment classification, a classifier combination approach is adopted because the original document is divided into two different and complementary parts; in semi-supervised sentiment classification, a co-training algorithm is proposed to incorporate unlabeled data for sentiment classification better than the baseline Experimental results across eight domains show that our method and the key sentence extraction is effective.

同期刊论文项目