提出了一种中文微博情感词典构建方法。采用上下文熵的网络用语发现策略,通过 TF-IDF(term frequency-inverse document frequency)进行二次过滤得到网络用语;利用 SO-PMI(semantic orientation-pointwise mutual infor-mation)算法在已标注的微博语料库中计算网络用语的情感倾向值,构建网络用语情感词典;将词典应用到微博情感分类实验,并与朴素贝叶斯分类器的分类性能进行了比较分析。实验结果表明,直接利用微博情感词典的分类效果好于朴素贝叶斯分类器,并具有分类过程简单、快速等优势。
A method of building Chinese microblog sentiment lexicon was proposed,which adopted the discovery strategies of context entropy for network language, acquired network languages from the secondary filtration by TF-IDF and computed the sentiment weights of network language by SO-PMI algorithm in the labeled corpus.The built lexicon was applied into the analysis experiments of micro-blog sentiment,which was compared with that of naive bayesian classifier.Experiment results showed that the efficacy of classification by the built micro-blog sentimental lexicon was better than that by naive bayesian classifier,and was simple and rapid in the classification process.