近年来,跨领域文本倾向性分析已成为自然语言处理领域的一个研究热点.它利用已经标注倾向性的源领域文本,预测目标领域文本的倾向性.然而,由于不同领域的数据往往服从不同的分布,导致传统的监督分类模型通常不能取得理想的效果.为解决以上问题,提出了一种基于加权SimRank的分析模型.本模型在加权SimRank算法的基础上,构建潜在特征空间,然后在潜在特征空间下学习得到映射函数,并对每个样本重新映射,从而缩小了不同领域间的数据分布差异,实现了跨领域情感分类.最后,通过实验验证了该方法的有效性.
Cross-domain sentiment classification has attracted more attention in natural language processing field currently. It aims to predict the text polarity of target domain with the help of labeled texts in source domain. Usually, traditional supervised classification approaches can not perform well due to the difference of data distribution between domains. In this paper, a weighted SimRank algorithm is proposed to address this problem. The weighted SimRank algorithm is applied to construct a Latent Feature Space (LFS) with feature similarity. Then each sample is reweighted by the mapping function learned from the LFS. After reducing the mismatch of data distribution between domains, the algorithm performs well on cross-domain sentiment classification. The experiment verifies the effectiveness of the proposed algorithm.