该文提出一种基于语言现象的文本蕴涵识别方法,该方法建立了一个语言现象识别和整体推理判断的联合分类模型,目的是对两个高度相关的任务进行统一学习,避免管道模型的错误传播问题并提升系统精度。针对语言现象识别,设计了22个专用特征和20个通用特征;为提高随机森林的泛化能力,提出一种基于特征选择的随机森林生成算法。实验结果表明,基于随机森林的联合分类模型能够有效识别语言现象和总体蕴涵关系。
This paper introduces an approach of textual entailment recognition based on language phenomena. The approach asopts a joint classification model for language phenomenon recognition and entailment recognition, so as to learn two highly relevant tasks, avoiding error propagation in pipeline strategy. For language phenomenon recogni tion, 22 specific and 20 general features are employed. And for enhancing the generalization of random forest, a feature selection method is adopted on building trees of random forest. Experimental results show that the joint classification model based on random forest recognizes language phenomena and entailment relation effectively.