针对阅读理解问答中的why型问题,提出基于问题话题和话题间因果修辞关系识别的答案句抽取方法.抽取时利用机器学习方法,选择可识别出对应问题话题的句子特征、问题话题与句子上下文之间因果关系特征,对篇章内的句子按照成为答案句的概率进行排序.对应问题话题的句子识别利用基于idf和语义角色的相似度;因果修辞关系的识别利用线索短语、特定语义角色、从文档集中挖掘的词间蕴含的因果关系概率信息、句子上下文的位置与表达形式.Remedia语料上的实验结果表明,该方法明显提高了why型问题回答的性能.
As an important branch in the study of question answering system,automatic reading comprehension(RC) system involves reading a short passage of text and answering a series of questions pertaining to that text.In all question types including who,what,when,where,why studied in the field of RC,answer extraction of why-question should apply the discourse structure information of text and the answer is not an named entity.Concerning these difference of why-question with other types,an answer sentence extraction approach for why-question of reading comprehension is given in this paper based on question topic and causal rhetorical relation identification.It uses machine learning model to rank sentences in text according to their probabilities of becoming answer sentence.In the model,two kinds of feature are used for identification of text sentence corresponding to question topic and that of causal rhetorical relation between question topic and sentence context respectively.In all features,the idf and semantic role similarity features are utilized to identify the sentence corresponding to the question topic,and other features,including cue phrases,special semantic roles,causal relation entailment probabilities between words mined from large scale document collections,position and expression format of sentence context,are used to identify causal rhetorical relation.Experimental results on Remedia corpus show that the method improves significantly the performance of reading comprehension why-question answering.