东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于情感词向量的微博情感分类

ISSN号：1003-0077
期刊名称：《中文信息学报》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：中国科学院计算所网络数据科学与技术重点实验室,北京100190
相关基金：国家重大基础研究发展计划（“九七三”计划）（2012CB316303,2014CB340401）资助项目;国家高技术研究发展计划（“八六三”计划）（2012AA011003）资助项目;国家自然科学基金重点（61232010）资助项目;国家科技支撑计划子课题（2012BAH46804）资助项目.

作者：杜慧[1,2], 徐学可[1], 伍大勇[1], 刘悦[1], 余智华[1], 程学旗[1]

关键词：文本分类, 文本表达, 有监督显式语义表示, text categorization, text representation, supervised explicit semantic representation

中文摘要：

文本表示作为文本分类的一个基本问题，一直广受关注。目前文本表示主要有词袋模型、隐式语义表达和基于知识库的显式语义表达3种方式。本文首先分析对比了这3种文本表示方式在文本分类中的效果。实验发现，基于知识库的显式语义表达并没有如预期一样提高文本分类的效果。经分析，其原因在于显式语义表达在扩展文档表达时易引入噪声。针对该问题，本文提出了一种有监督的显式语义表达方法。该方法利用数据集的标注信息识别文档中与分类最相关的核心概念，并扩展核心概念以形成文档显式语义表达。3个标准分类数据集上的结果证实了本文所提文本表示方法的有效性。

英文摘要：

As a fundamental problem of text categorization, text representation is widely concerned. Cur-rently, there are three main ways of text representation： bag-of-words model, latent semantic represen- tation and knowledge-based explicit semantic representation. The paper analyzes and compared the effects of these methods applied to text categorization. Experiments show that the knowledge-based ex- plicit semantic representation cannot improve the text categorization performance as expected. To tackle the problem that the knowledge-based explicit semantic representation easily introduces noise in extending text, a supervised explicit semantic representation method is proposed. The dataset label information is used to identify the most relevant concepts in document and the document is represented in explicit se- mantic based on expanding those key concepts. The results of three datasets confirm the effectiveness of the proposed method.

同期刊论文项目