已有的跨领域情感分类方法多通过抽取公共特征空间或建立领域特定特征间的映射关系来消减领域问的差异性,由于不考虑特征情感区分力的差异,使得公共特征空间及特征映射的求解往往不准确。具有高区分力的特征对于文本情感分类具有重要的意义,但标记的缺失使得已有的特征选择方法难以应用。文章基于特征选择方法,提出一种快速的跨领域情感分类方法(cross-domain sentiment classification based on leature selection,CSFS),构建源领域特征与目标领域特征的词共现矩阵,基于该矩阵对目标领域特征的情感区分力进行评估,在目标领域中选择出其中具有高情感区分力的特征;再利用源领域信息计算目标领域特征的情感语义大小,从而构建目标领域分类器。实验结果表明,该方法在保证准确率的前提下,大大提高了跨领域分类的效率。
Many existing cross-domain sentiment classification methods reduce the distribution difference between domains by extracting a common sub-space or establishing the mapping relationship between domain specific features, and do not consider the difference of features' sentiment orientation. Some features with lower sentiment orientation will influence the result of sub-space and mapping relationship. Features with higher sentiment orientation are important for sentiment classification. However, it is difficult to apply existing feature selection methods on unlabeled data. In this paper, a fast cross-domain sentiment classification based on feature selection(CSFS) is proposed. Firstly, the word co-occurrence matrix between the source features and target features is constructed, the sentiment orientation of target domain features is evaluated, and then words with higher sentiment orientation are selected as the feature space of target domain. Secondly, the features in target domain are labeled using the source features, and then a classifier is created based on the labeled features. The empirical result shows that CSFS highly improves the time efficiency of cross-domain classification while maintaining the classification accuracy.