基于协同学习,提出一种基于交叉采样与结构情感信息的跨语言情感分析交互学习模型.首先,通过启发式识别方法抽取文本中的情感表达作为结构情感特征,将其融合到传统的n-gram特征空间中,形成情感表征性更强的特征空间;其次,在传统协同学习的框架基础上,提出一种交叉采样策略对2种语言视图中的非标注数据的情感知识交互迁移,从而实现将源语言与目标语言进行高效融合学习;最终获得具有更高性能的目标语言情感分类器.实验结果表明:相较于传统跨语言情感分析模型,基于交叉采样和结构情感融合的半监督学习框架可以高效地利用少量源语言标注数据挖掘出大量的未标注数据中的情感知识,从而帮助目标语言学习出更优质的情感分类器.
Based on co-training, we propose a mutual-learning framework for cross-lingual sentiment analy- sis based on a cross-sampling strategy and the structural sentimental information. Firstly, we use a heuris- tic method to extract sentimental expressions from training data and then we join them into n-gram fea- tures to form a highly sentiment-expressive feature space. Subsequently, we integrate into traditional co- training framework with a cross-sampling strategy to mutually learn the sentimental knowledge from unla- beled data in the both two languages. During the learning, sentimental knowledge from different languages are mutually fused to each other language. Finally, we can learn a sentiment classifier in the source lan- guage with our proposed framework. The experimental results show that our proposed method can effi- ciently leverage a small scale of a labeled data and massive unlabeled data in the both languages to get a more dependable and high-quality sentiment classifier in the target language comparing to existing cross- lingual sentiment analysis(CLSA) methods.