基于微博表情符号,提出一种自动构建情感词典的方法.从微博平台抓取大量带有表情符号的微博文本,并依据表情符号对微博文本进行情感倾向标注,生成情感语料库.对语料库进行分词、去重等预处理工作,根据词性规则抽取微博文本中情感词,统计每个情感词在正向和负向语料库中出现的次数,计算情感词的卡方统计值获得情感强度,根据情感词在正负微博文本中出现的概率判定情感词的倾向性,进而生成情感词典.这是一种全新的思路.以人工标注的情感词典为基准数据,实验结果表明,本文方法标注情感词的准确率在80%左右,在情绪词强度阈值θ为20、30时,生成情感词典综合F值最好,达到了82%以上.
A method for automatically building sentiment lexicon based on microblogging smiley was proposed.Firstly,a large number of microtext was crawled with emotions from the microblogging platform,the sentiment tendency was annotated based on the micro-smiley to generate emotion corpus.After some preprocessing such as segmentation and duplication removal have been done for the corpus,the sentiment word was then exacted according to rules of part of speech,statistics for each positive and negative emotion words in the corpus to calculate the sentiment value of the word chi-square statistic obtained emotional intensity;according to the positive and negative emotion words appear in the text microblogging the probability of emotional words tendentious was determined,thereby emotion dictionary was generated.This is a new way of thinking.With artificial sentiment dictionary marked as baseline data,the experimental results show that the accuracy of the proposed method marked the emotional words is about 80%,and when the intensity threshold of emotional words is 20,30,it gets the best F-value of generated emotional dictionary,reaching more than 82%.