中文微博包含了用户对热点话题的观点,对其进行观点挖掘可以实现突发事件预警、舆情监控等。目前,微博研究多数基于英文语料,中文微博观点句的挖掘大多混淆在情感挖掘中少量提及,由于中文微博特殊的语体特征,导致传统中文文本观点挖掘模型无法取得理想效果。区别于已有的情感挖掘工作,本文依据中文微博的语体特征分析结果选取特征,除了选取情感特征外,还加入主张性动词、语气词、程度副词以及固定词性结构等观点句特征,采用CRFs模型进行观点句识别研究。实验结果表明,仅选取情感特征准确率较高,但召回率仅为32.1%。而加入其他观点句特征后,召回率显著提高到61.8%。该方法应用于2012年中国计算机学会(CCF)组织的“观点句识别”测评任务中,取得了很好的效果。
Chinese Microblog include many opinions about hot topics. Mining opinion can realize early warning and public sentiment monitoring. Most of researches are usually based on English corpus. The existing researches generally confuse opinion mining and sentiment mining. Because of the specific stylistic features of Chinese Microblog, the traditional Chinese text opinion mining models cannot achieve ideal effects. In this paper, the features selections according to the analysis of the specific stylistic features of Chinese Microblog. Selecting declared verb, modal particles, degree adverb and fixed part of speech structures as the experiment features except the sentimental feature, which distinguish from sentiment mining. This paper used a CRFs(Conditional Random Fields) as the classification model. The results showed that recall ratio is only 32.1%o , which is only used the sentimental feature. Added the other features, the recall ratio increased to 61. 8%. This method was achieved an ideal effect with the opinion mining task of Chinese microblog which is held by China Computer Federation Technical Committee on China Information Technology.