藏文微博具有独特的语法特点,传统方法对藏文文本进行情感分类很难取得较好效果。结合藏文句法结构和语义特征向量构建语义特征空间,提出了一种基于语义空间的藏文微博情感分析方法。首先使用句法树生成句法结构并结合语义特征向量构建特征空间,运用K-means方法聚类形成语义簇质心,将基于簇的TF-IDF值作为最终的微博情感特征值。实验结果表明,该方法的情感分类效果均优于SVM+TF-IDF和naive Bayes+最大熵的方法。
Tibetan micro-blog has unique grammatical features,traditional classification method can achieve good results but for Tibetan classification efficiency is not better. This paper presented an emotional classification method of Tibetan micro-blog that based on the semantic space with Tibetan syntactic structure. Firstly,the method generated the syntactic structure using the syntax tree. Then it combined syntactic structure and semantic feature vector to construct the semantic feature space. In the feature space,it formed semantic cluster centroid by K-means clustering method. Finally,it calculated the emotional values of micro-blog by TF-IDF based on the clusters. Experimental results show that this method can more accurately classify on Tibetan micro-blog emotion,compared with SVM + TFI-DF and naive Bayes + maximum entropy.