目前,信息抽取研究主要面向肯定性信息,而自然语言文本中包含了大量否定性和不确定性信息,为了将此类信息与肯定性信息区分开,有必要针对否定性与不确定性信息抽取进行深入研究.针对这一任务,首次构建了一个16841句的汉语语料资源,利用序列标注模型与卷积树核模型,系统地探索了各种序列化依存特征和结构化句法树特征的有效性,并提出了元决策树模型,对二者进行融合.实验结果显示,该方法在否定性和不确定性信息抽取任务上的精确率分别达到69.84%和58.57%,为相关研究打下了坚实的基础.
The current research on information extraction mainly focuses on affirmative information. However there are more negation and uncertainty information in natural language texts. For purpose of separating them from affirmative information, it is necessary to make an intensive study of negation and uncertainty information extraction. For this task, this study firstly constructs a Chinese corpus including 16 841 sentences. Employing the sequence labeling model and the convolution tree kernel model, it systematically explores the efficiency of various kinds of serialized dependency features and structured parsing features. Finally, it proposes a meta-decision tree model to integrate the above two models. Experimental results show that the performances of the new method on negation and uncertainty information extraction achieve 69.84% and 58.57% of accuracy respectively, providing a solid foundation for related studies in the future.