为了提高文本情感倾向性分类的精度,提出了一种文本情感倾向性分析方法 bfsmPMI-SVM.该方法在文本预处理阶段,滤除了对表述主题情感倾向性不强烈的语句以及无关停用词等;用改进的PMI-IR算法对情感倾向性词语抽取,并自动扩充了正负基准词集;改进了互信息(MI)算法,在MI的计算中增加了词频因子(f)、类别差异因子(b)和符号因子(s).利用改进的MI算法选择文本特征,融合其他一些文本特征,用SVM实现文本情感倾向性分类.实验以食品安全领域爬取文本为例,与PMI-IR-SVM和MI-SVM算法的倾向分析相比,本文方法的正向文本准确率、负向文本准确率、召回率和F1值等都有提高.
In order to improve the accuracy of text sentiment classification,this paper proposes a novel text sentiment analysis method bfsmPMI-SVM.At the preprocessing stage,the method filters out the stop words and the sentences with less emotional expressions to the themes.We also automatically extend the positive and negative base thesaurus using the extracted sentimental words by our improved PMI-IR algorithm.The Mutual Information(MI)algorithm is then improved by adding the frequency factor(f),difference factors(b)of categories of and symbol factor(s).Fused the features selected by MI with other features,the SVM is finally utilized to classify the texts.Compared with PMI-IR-SVM and IM-SVM,our method demonstrates the higher recall rate,F1 and higher accuracy for positive and negative text classification with the crawled texts from the field of food safety.