针对微博中新情感词的识别问题,提出了一种基于词语相关度的微博新情感词自动识别方法。首先,对于分词软件把一个新词错分成几个词的问题,利用组合思想将相邻词进行合并作为新词的候选词;其次,为了充分利用词语上下文的语义信息,采用神经网络训练语料获得候选新词的空间表示向量;最后,利用已有的情感词典作为指导,融合基于词表集合的关联度排序和最大关联度排序算法,在候选词上筛选,获得最终的情感新词。在COAE2014(第六届中文倾向性分析评测)任务3语料上,提出的融合算法与点互信息(PMI)、增强互信息(EMI)、多词表达距离(MED)、新词语概率(NWP)以及基于词向量的新词识别方法相比,准确率至少提高了22%,说明该方法自动识别微博新情感词效果优于其他五种方法。
Aiming at new sentiment word identification, an automatic extraction of new words about microblog was proposed based on the word association. Firstly, a new word, which was incorrectly separated into several words using the Chinese auto-segmentation system, should be assembled as the candidate word. In addition, to make full use of the semantic information of word context, the spatial representation vector of the candidate words was obtained by training a neural network.Finally, using the existing emotional vocabulary as a guide, combining the association-sort algorithm based on vocabulary list and the max association-sort algorithm, the final new emotional word was selected from candidate words. The experimental results on the task No. 3 of COAE2014 show that the precision of the proposed method increases at least 22%, compared to Pointwise Mutual Information( PMI), Enhanced Mutual Information( EMI), Normalized Multi-word Expression Distance( NMED), New Word Probability( NWP), and identification of new sentiment word based on word embedding, which proves the effectiveness of the proposed method.