利用选择类问题具有明确候选项的特点,简化问题分类过程,并针对长文本语义蕴含短文本语义的语言现象,提出一种根据文本蕴含强度大小对候选答案进行排序的方法。在没有大规模问答对的情况下,采用维基百科中文语料库,以全国各省市高考地理选择题作为实验数据,通过句子相似度和文本蕴含两种方法来解答地理选择题。实验表明,基于文本蕴含方法的准确率为36.93%,比基于词嵌入的句子相似度方法提高2.44%,比基于向量空间模型的句子相似度方法提高7.66%,验证了该文本蕴含强度计算方法的有效性。
This paper proposes a method to compute textual entailment strength, taking multiple-choice questions which have clear candidate answers as research objects, aiming at the phenomenon of long text entailing short text. Two methods are used to answer the college entrance examination geography multiple-choice questions based on the Wikipedia Chinese Corpus in the absence of large-scale questions and answers. One is based on the sentence similarity and the other is based on the textual entailment proposed above. The accuracy rate of the proposed method is 36.93%, increasing by 2.44% than the way based on the word embedding sentence similarity, increasing 7.66% than the way based on the Vector Space Model sentence similarity, which confirm the effectiveness of the method based on the textual entailment.