情感倾向性分析旨在识别评论中隐含的情感信息,在产品声誉分析、舆情监控、个性推荐等方面具有广阔的应用前景。在评测消费者对新发布产品的态度时,本产品领域中可供参考的已分类评论数据往往较少,而其他相关领域可能存在大量的已分类的评论数据,利用其他产品已标注的评论数据对新产品进行情感倾向性分析,属于跨领域的情感分类问题。针对这一问题,本文引入迁移学习机制,将经典迁移学习TrAdaBoost算法的样本迁移机制应用于情感倾向性分析,并针对积极类和消极类分类精度不均衡问题提出了改进策略,首先根据评论样本权重进行第一次选择,其次结合分类置信度对评论样本进行第二次选择。实验结果表明,在整体分类精度有所提高的前提下,改进算法的优势在于均衡了积极类和消极类的分类精度,使得分类结果更具实际参考价值。
Sentiment analysis aims to identify the emotional information contained in the comments. It has wide application prospects in reputation analysis, public opinion analysis and personalized recommendation. When we want to get the consumers' sentiments about a new designed or published product, there may be lack of the labeled comments in this domain, while we have a large number of labeled comments in other certain related domains. Analyzing the sentiments of a new product using the labeled data of other products is the task of cross-domain sentiment analysis. This paper applies TrAdaBoost, a state-of-the-art transfer learning algorithm, to the sentiment transfer and provides an improvement strategy for reducing the imbalance between the positive and negative classifications. First, according to the sample weight, we select the samples. Then we combine the classification confidence for second selection. Experimental results show that, the overall classification accuracy increased, and the modified algorithm balances the classification accuracy of negative class and positive class, which has more practical values in the real world applications.