中英文微博大都以单一语种来表述,而将近80%的藏文微博都是以藏汉混合文本形式呈现,若只针对藏文内容或中文内容进行情感倾向性分析会造成情感信息丢失,无法达到较好效果。根据藏文微博的表述特点,该文提出了基于多特征的情感倾向性分析算法,算法使用情感词、词性序列、句式信息和表情符号作为特征,并针对藏文微博常出现中文表述的情况,将中文的情感信息也作为特征进行情感计算,利用双语情感特征有效提高了情感倾向性分析的效果。实验显示,该方法对纯藏文表述的微博情感倾向性分析正确率可达到79.8%,针对藏汉双语表述的微博在加入中文情感词、中文标点符号等特征后,正确率能够达到82.8%。
While most Chinese or English micro-blogs are in just one single language, nearly 80% Tibetan Micro blogs are mixed text of Tibetan and Chinese languages. If emotion orientation analysis is only targeted at Tibetan or Chinese, this analysis would he partial and fail to achieve its goal. According to the expression features of Tibetan micro-hlogs, this paper puts forward the algorithm of multi-feature sentiment analysis, upon such features as emo- tional words, the sequence of part of speech, sentence information and emoticon signs. Dealing with Tibetan microblogs, this algorithm takes into consideration the emotional information of Chinese language and has improved the effect of sentiment analysis with the help bilingual information. The experimental results indicate that the sentiment analysis accuracy concerning monolingual Tibetan expression is 79.8%, which is boosted up to 82.8 % after taking into consideration of the features of Chinese emotional words and Chinese punctuations.