通过增加情感词典种类提高系统对网络词汇、表情符号进行分词和情感分析的准确性;以某酒店的客户评论为原始数据,提取正负向情感词的数量、否定词、程度副词以及特殊符号数量等文本特征后进行不同的特征组合,通过K重交叉验证和网格搜索算法找到SVM(支持向量机)算法的最优参数组合C和g。采用SVM对不同的特征组合进行训练测试并对每个组合的正确率进行分析,然后找出最适合用户评论情感分析的文本特征及特征组合。结果表明:在每个特征组合获取其最优的C和g参数组合的前提下,选用正负向情感词、否定词、情感分值、程度副词的特征组合测试正确率最高,达到93.4%。
This paper improves the accuracy of word segmentation and emotion analysis of network vocabulary and expressions by increasing the variety of emotion dictionary. On the other hand, customer reviews of a hotel are used as the original data. After extracting the amount of text features, such as positive and negative words, negative words, the degree of adverbs and the amount of special symbols, we make different feature combinations, and hope to find the optimal combination of parameters SVM inclu- ding C and g through the k-fold Cross Validation and grid search algorithm. Training and testing different feature combinations by SVM and analyzing the correct rate of each combination, we find out the most suitable combination of text feature and feature a- nalysis which are used for study of user reviews of emotion. The results show that under the premise of satisfying the optimal com- bination of parameters C and g, the correct rate of the feature combination using positive and negative emotional words, negative words, emotion score and degree adverbs is the highest and reaches 93.4%.