针对基于词向量的神经网络模型在产品属性情感分析中效果不佳的问题,提出一种集成离散特征和词向量特征的开关递归神经网络模型。首先,通过直接循环图为语句建模,采用开关递归神经网络模型完成产品属性情感分析任务;然后,在开关递归神经网络模型中集成离散特征和词向量特征;最后,分别在流水线、联合、折叠三种任务模型中完成属性提取和情感分析任务。以宏观F1分数作为评估指标,在Sem Eval-2014的笔记本电脑和餐馆评论数据集上做实验。开关递归神经网络模型的F1分数为:48.21%和62.19%,超过普通递归神经网络模型近1.5个百分点,因而开关递归神经网络能够有效捕获复杂特征,提升产品属性情感分析的效果。而集成离散特征和词向量特征的神经网络模型的F1分数为:49.26%和63.31%,均超过基线结果 0.5到1个百分点,表明离散特征和词向量特征互相促进,另一方面,也表明仅仅基于词向量的神经网络模型仍有提升空间。三种任务模型中,流水线模型的F1分数最高,表明应将属性提取和情感分析任务分开完成。
Concerning the poor results of product property sentiment analysis by the simple neural network model based on word vector, a gated recursive neural network model of integrating discrete features and word vector embedding was proposed. Firstly, the sentences were modeled with direct recurrent graph and the gated recursive neural network model was adopted to complete product property sentiment analysis. Then, the discrete features and word vector embedding were integrated in the gated recursive neural network. Finally, the feature extraction and sentiment analysis were completed in three different task models: pipeline model, joint model and collapsed model. The experiments were done on laptop and restaurant review datasets of SemEval-2014, the macro F1 score was used as the evaluation indicator. Gated recursive neural network model achieved the F1 scores as 48.21% and 62.19%, which were more than ordinary recursive neural network model by nearly 1.5 percentage points. The results indicate that the gated recursive neural network can capture complicated features and enhance the performance on product property sentiment analysis. The proposed neural network model integrated with discrete features and word vector embedding achieved the F1 scores as 49.26% and 63.31%, which are all higher than baseline methods by 0.5 to 1.0 percentage points. The results show that discrete features and word vector embedding can help each other, on the other hand, it's also shown that the neural network model based on only word embedding has the room for improvement. Among the three task models, the pipeline model achieves the highest F1 scores. Thus, it's better to complete feature extraction and sentiment analysis separately.