文本情感分析是目前自然语言处理领域的一个热点研究问题,具有广泛的实用价值和理论研究意义.情感词典构建则是文本情感分析的一项基础任务,即将词语按照情感倾向分为褒义、中性或者贬义.然而,中文情感词典构建存在两个主要问题:1)许多情感词存在多义、歧义的现象,即一个词语在不同语境中它的语义倾向也不尽相同,这给词语的情感计算带来困难;2)由国内外相关研究现状可知,中文情感字典建设的可用资源相对较少.考虑到英文情感分析研究中存在大量语料和词典,该文借助机器翻译系统,结合双语言资源的约束信息,利用标签传播算法(LP)计算词语的情感信息.在四个领域的实验结果显示我们的方法能获得一个分类精度高、覆盖领域语境的中文情感词典.
Currently, sentiment analysis has become a hot research topic in the natural language processing (NLP) field as it is highly valuable for many practice usages and theory studies. One basic task in sentiment analysis, named the construction of sentiment lexicon, aims to classify one word into positive, neutral or negative according to its sentimental orientation. However, there are two major challenges: 1) Chinese words are very ambiguities, which makes it hard to compute the sentimental orientation of a word; 2) Given the related research on sentiment analysis, available resource for constructing Chinese sentiment lexicons remains few. Note that there are several corpus and lexicons in English sentiment analysis. In this study, we first use machine translation system with bilingual resources, i. e. , English and Chinese information, then get the sentiment orientation of Chinese words by the label propagation algorithm. Experiment results across four domains demonstrate that the lexicon generated with our ap- proach reach an excellent precision and could cover domain information effectively.