由于中文自然语言处理的特点和困难以及相应的语言处理基础资源的相对缺乏,使得国外一些成熟技术和研究成果不能直接应用到中文问答系统中.为此,针对中文事实型问答系统,提出一种新的基于句法结构特征分析及分类技术的答案提取算法,该方法将答案提取问题看成是候选答案的分类问题,即将候选答案分类为正确和错误两类.首先,该方法根据与问题类型所对应的候选答案的类型信息,从文本片断中提取出候选答案及其在句子中的简单特征和句法结构特征;然后利用这些特征训练分类器;最后用训练得到的分类器判别候选答案是否为正确答案.针对中文事实性问题,该方法与目前典型的基于模式匹配的中文答案提取算法相比,准确率提升6.2%,MRR提升9.7%.
Due to the feature and difficulty of Chinese natural language processing and the lack of related resources, some foreign mature techniques can not be applied in Chinese Question Answering (QA) system. For the Chinese factoid QA system, a new answer extraction method based on syntax structure feature parsing and classification is presented in this paper. With the method, the answer extraction is regarded as candidate answer classification problem,i, e. candidate answers are classified into correct and incorrect answer. According to the part-of-speech information of candidate answers corresponding to question types, the candidate answers and their features (both simple and syntactic) in sentences from snippets are firstly extracted. Then these features are used to train the classifier. Finally, the trained classifier is used to distinguish whether the candidate answer is correct or not. For Chinese factoid questions, comparing to currently typical pattern matching based answer extraction algorithm, the new method improves precision by 6.2% and MRR by 9.7%.