基于神经语言模型的词向量表示技术能够从大规模的未标注文本数据集中自动学习词语的有效特征表示,已经在许多自然语言处理任务及研究中取得重要进展.微博中的表情符号是微博情感分析最重要的特征之一,已有大量的研究工作在探索有效地利用表情符号来提升微博情感分类效果.借助词向量表示技术,为常用表情符号构建情感空间的特征表示矩阵R~E;基于向量的语义合成计算原理,通过矩阵R~E与词向量的乘积运算完成词义到情感空间的映射;接着输入到一个MCNN(Multi-channel Convolution Neural Network)模型,学习一个微博的情感分类器.整个模型称为EMCNN(Emotion-semantics enhanced MCNN),将基于表情符号的情感空间映射与深度学习模型MCNN结合,有效增强了MCNN捕捉情感语义的能力.EMCNN模型在NLPCC微博情感评测数据集上的多个情感分类实验中取得最佳分类性能,并在所有性能指标上超过目前已知文献中的最好分类效果.在取得以上分类性能提升的同时,EMCNN相对MCNN的训练耗时在主客观分类时减少了36.15%,在情感7分类时减少了33.82%.
Word embedding based on neural language model can automatically learn effective word representation from massive unlabeled text dataset,and has made essential progress in many natural language processing tasks.Emoticons in microblog are important emotion signals for microblog sentiment analysis.There have been a lot of research works exploiting emoticons to improve sentiment classification performance for microblog effectively.Commonly used emoticons are adopted to construct an emotion space as feature representation matrix R~E from their word embedding.On the basis of vector based semantic composition,the projection to emotion space is performed as matrix-vector multiplication between R~E and other embedding.Then,the results are forward to MCNN to learn a sentiment classifier for microblog.This new model is named as EMCNN,short for Emotion-semantic enhanced MCNN,which seamlessly integrates emotionspace projection based on emoticon into deep learning model MCNN to enhance its ability of capturing emotion semantic.On the datasets of NLPCC microblog sentiment analysis task,EMCNN achieves the best performance in several sentiment classification experiments and surpass the state-of-the-art results on all the performance metrics.Comparing to MCNN,EMCNN not only improve the classification performance,but also reduce the training time,i.e.36.15% for subject classification and 33.82%for 7-class sentiment classification.