为了解决在汉语问答系统答案提取时,由于词的同义或多义现象而导致的“漏提”或“错提”等问题,提出了一种基于潜在语义分析(LSA)的问题和答案句子相似度计算方法.它利用空间向量模型作为问题和句子的表示方法,借助于潜在语义分析理论,对大量问答作句子语料统计分析,构建了一个潜在的词一句子语义空间,从而消除了词之间的相关性,并在语义空间上实现了问题与答案句子相似度计算,有效地解决了词的同义和多义问题.最后结合问题类型和相似度计算结果,对汉语基于事实的简单陈述问题进行了答案句子提取实验.答案提取的MRR值达到了0.47,明显优于空间向量模型.结果说明该方法具有很好的效果.
When extracting answers in Chinese question-answering system, synonymy will cause to lose several correct answers, and polysemy will cause to extract wrong answers. In order to solve these problems, this paper proposes a method to calculate similarity between question and sentence based on Latent Semantic Analysis (LSA). This method represents the question and sentence with space vector model, statistically analyzes the abundant question-answering sentence pair corpus with the help of latent semantic analysis theory, and constructs a latent word-sentence semantic space, which gets rids of the correlativity between word. And then similarity calculation between question and sentence is implemented in this semantic space. So the question of synonymy and polysemy is solved effectively. Finally, combining question type and similarity between question and sentence, the experiment on extracting sentence as answer for Chinese factoid question is done. The MRR value with LSA is 0.47, which is better than VSM obviously. The results show that this method makes a very better effect.