微博情感分析是研究社交网络舆情的一项关键技术。微博表情符号和情感词汇等是一类直观显性的情感特征,而微博的内容语义则可视为隐性特征,且对情感判定往往具有决定性作用,因此本文提出将两类特征因素融合的微博情感分析方法。首先构建情感分析词典、网络用语词典以及表情符号库,定义微博频繁特征词集,再根据频繁特征词集,利用最大频繁项集获得微博初始情感簇;针对初始簇间存在文本重叠情况,提出基于短文本扩展语义隶属度的簇间重叠消减算法,获得完全分离的初始簇;最后根据簇语义相似度矩阵,给出一种凝聚式情感聚类方法。利用NLPCC2013评测所提供的训练语料进行情感分类实验,说明了分析该文方法的性能优势,并以2014年3月8日马航事件微博数据为例,给出了利用微博情感分析公众随事态发展的情感变化,说明了该文方法的实用效果。
Micro-blog sentiment analysis is a key technique of public opinion research for social networks.Micro-blog emoticons and sentiment words are both of intuitive called as explicit emotion features,while the content semantics are called implicit features which sometimes are very important for micro-blog emotion discrimination.Therefore,in this paper,a new systematic methodology for sentiment analysis is proposed using both explicit and implicit emotion features.At first,the sentiment analysis dictionary,the glossary of social networking terms,as well as the emoticon library,are all initialized.Then,the text micro-blog frequent word sets are defined.According to the feature set of words,the initial micro-blog clusters can be directly generated depending on the maximum frequent item sets.Furthermore,as to solve the micro-blog overlap problem between multiple initial clusters,an efficient elimination method is proposed employing the extended membership degree of the short-message semantic.Finally,the semantic similarity matrix for each separated cluster is defined,based on which a hierarchical sentiment clustering for microblogs is conducted.Taking the well-known contest NLPCC2013 in China as instance,the efficiency of our proposed method is proved by the comparing experiments.At last,a real-world case is also done to exactly show the emotion change from Chinese micro-blogs for the Malaysia Airlines Disappear Incident during March 8to Spril 8,2014