该文提出一种基于多层次语言特征的弱监督的情感分析方法,先以少量情感词构成初始情感词典,用这些种子词汇作引导,根据评论文本在单词、短语及句子级别的语言特征结合上下文挖掘目标文本中潜在的具有情感倾向的词汇/短语。通过自训练不断扩充情感词典,最终得到一个具有领域特征的情感词典,并用所得到的情感词典对目标文本的情感倾向进行判断。与其他方法在同一数据上的结果相比,该方法以很小的词典规模取得了最高的F-score,并且得到的情感词含义明确。方法用于不同领域也取得了较高的精度,表明方法具有较好的领域适应性。
In this paper, a weakly supervised sentiment analysis approach is proposed. A few words are collected to construct an initial sentiment lexicon. These seed words are used to mine potential sentimental words in the target text. In this process, linguistic features at multi-levels are explored and the role of the context is examined. The lex- icon is expanded iteratively, and the final version is applied to classify the sentiment of a target document. Compared to results of previous studies on the same data, this approach achieves the best F-score while the constructed senti- ment lexicon is rather small. The experimental results also show that this approach is robust when applied to a texts of different domains.