微博等社交媒体为人们情绪表达提供了重要平台,分析微博的情绪倾向具有重要的商业价值和社会意义。文中提出了基于词典的规则方法识别微博所表达的喜、哀、怒、惧、恶、惊六种情绪。针对情绪表达的重要线索表情符利用互信息法生成了表情符词典,与传统情绪词典相结合,制定了针对否定用法的规则对微博进行分析。建立了第一个包含六种情绪的人工标注微博数据集。实验表明,传统的情绪词典虽然收录了大量词汇,但对于社交媒体文本分析的准确率和覆盖率都不高。表情符词典的应用显著地提高了微博情绪分析的精度和覆盖率。
The proliferation of micro-blogs has created a popular digital platform where people are able to express emotions and share feelings. Analysis of emotions in micro-blogs would be potentially beneficial to companies and the society. In this paper, a lexicon-based approach is proposed to identify six emotions in micro-blog text, including joy, sadness, anger, fear, disgust and surprise. A lexicon of emoticons is built based on the mutual information method between emoticons and emotions. Combined with a traditional emotion lexicon in this approach, negation rules are made to process negations in emotion expression to analyze mirco-blog. The first corpus of Chinese micro-blogs manually annotated with the six emotions is built as the test set. The experimental results show that the traditional lexicon has a moderate accuracy and coverage in analysis of micro-blog text. The combination of the two lexicons greatly improves the accuracy and coverage.