以往的卷积神经网络模型在对文本建模和分类时,通常按顺序提取n-gram卷积特征,忽视了长距离依存关系中的句法结构和语义信息。提出了一种基于事件卷积特征的文本分类方法,利用事件的语义特性弥补之前模型的不足。该方法使用依存关系抽取出文本中的事件集合,通过卷积神经网络进行事件特征提取,并在此基础上进行文本分类。在对中文新闻语料的多分类实验中,该方法较传统的文本分类方法有明显的提高,较使用n-gram的卷积神经网络模型更为稳定。实验结果说明了模型的有效性以及事件特征的优越性。
In text modeling and classification, previous convolutional neural network (CNN) approaches processed on the ngram features based on the literal order of texts. They neglected the syntactic structure and semantic information over long distance dependencies. This paper proposed a event convolutional feature based model to overcome the defects by making use of semantic characteristics of events. It found events from text and applied a CNN to extract features for classification. In Chinese news multi-class classification experiment, the method performs better than traditional ones and is more balanced than n-gram CNN models. The experiment result shows the effectiveness of the model as well as the superiority of the event features.