阅读理解系统是通过对一篇自然语言文本的分析理解,对用户根据该文本所提的问题,自动抽取或者生成答案。本文提出一种利用浅层语义信息的英文阅读理解抽取方法,首先将问题和所有候选句的语义角色标注结果表示成树状结构,用树核(treekernel)的方法计算问题和每个候选句之间的语义结构相似度,将该相似度值和词袋方法获得的词匹配数融合在一起,选择具有最高分值的候选句作为最终的答案句。在Remedia测试语料上,本文方法取得43.3%的HumSent准确率。
Automatic reading comprehension systems can analyze a given passage and generate/extract answers in response to questions about the passage. An approach integrating shallow semantic information to extract answer sentence is proposed in this paper. The labeled semantic roles in question and candidate sentences are represented as semantic trees, then the structure similarity is calculated using tree kernel between them. After combining the similarity with matching words count obtained using bag of-words method, the sentence with the highest score is chosen as answer sentence. The proposed approach achieves 43.3% HumSent accuracy on the Remedia corpora.