主要对COAE 2014评测中采用的算法进行描述,并结合评测结果进行分析比较。本次评测共有5个任务,本文重点描述与微博相关的3个任务。在微博情感新词发现和判断的任务中,方法核心是利用谷歌翻译算法的对齐操作来获得候选新词,之后使用平均点互信息筛选高频词语。在微博倾向性分析任务中采用两种方法,一种是传统的基于情感词典的极性判断方法,另一种是结合情感词标注的基于条件随机场CRFs的极性判断方法。在微博观点句要素抽取任务中,首先利用名词在复杂网络中的中介性和趋近性提取候选产品名和属性名,然后分别采用3种方法完成对产品属性名的抽取,其中,第一种方法是基于简单规则的滑动窗口抽取策略,后面两种方法都是基于CRFs的有监督抽取策略。
This paper was a report on COAE2014.The methods to solve the tasks were described,and deeply analyzed by referring to the results.There were 5 different tasks in this year’s contest,3 of which were related to Micro-blog and were focused in this paper.In the new sentiment words discovering and determining of Micro-blog task,the important processes was extracting candidate new words by using the alignment results of Google translation service,then filtering frequent words by ranking their PMI.In the sentiment classification of Micro-blog task,two different methods were used to solve the problem.One was based on sentiment lexicon which was the traditional method.The other was based on CRFs combining the sentiment lexicon.The last task was to extract opinion aspects from Micro-blog and then to determine the sentiment on them.Firstly,the phrases that represent the products’name and aspects were extracted according the be-tweenness and closeness of the complex network formed by all the nouns in two steps respectively.Then,three methods were introduced to extract the exact product aspects and its sentiment.The first one was based on simple rules which ex-tracted phrases in the sliding window.The other two were supervised learning procedures which were all based on CRFs.