本文提出了一种基于无监督学习算法的问答模式抽取技术从互联网上抽取应用于汉语问答系统的答案模式。该算法可以避免有监督学习算法的不足,它无需用户提供〈提问,答案〉对作为训练集,只需用户提供每种提问类型两个或以上的提问实例,算法即可通过Web检索、主题划分、模式提取、垂直聚类和水平聚类等步骤完成该类型提问的答案模式的学习。实验结果表明,论文提出的无监督问答模式学习方法是有效的,基于模式匹配的答案抽取技术能够较大幅度地提高汉语问答系统的性能。
The paper presents an unsupervised learning algorithm to learn answer pattern for answer extraction module of Chinese Question Answering (QA). Given two or more questions of one question type, the algorithm can learn the corresponding answer patterns from internet via web search, topic segmentation, pattern extraction, vertical clustering and horizontal clustering, etc. The experimental results show that the performance of pattern-based answer extraction of Chinese QA is improved significantly.