具有较强褒贬倾向的词语搭配对于文本的情感分析具有重要的价值。该文提出了一种混合语言信息的词语搭配的倾向判别方法。该方法首先根据词语搭配六种模式的特点,确定出各模式的概率潜在语义模型,然后利用这些语义模型判别搭配的情感倾向。最后对部分包含情感词的搭配再利用规则修正其先前标注的情感倾向。基于汽车语料的实验结果表明,基于混合语言信息的词语搭配情感倾向判别方法优于单纯基于概率潜在语义模型或规则的方法。
The collocations with strong sentiment orientation are important for the text sentiment analysis. In this paper, a method of collocation orientation identification based on hybrid language information is proposed. Firstly, according to the characteristics of six kinds of collocation patterns, the probability latent semantic models are determined for them. Then the obtained semantic models were used to identify the sentiment orientations of collocations. Lastly, for some collocations containing a sentiment word, their previous tags were modified by using some constructed rules. The experiment result in the corpus of ear reviews indicates that the proposed method is superior to the method based only on probability latent semantic model or rule for collocation orientation identification.