传统的搜索引擎返回的数据太过庞大,很多情况下用户不能快速地找到自己要的答案。在这种情况下,文中引入FAQ系统。FAQ中如何找到最佳匹配答案,是文中的研究重点。改进了传统的VSM模型,使得它能更好地体现问题中词的权重。重点引人了LDA模型,并用计算机故障领域内的文档资料对它进行训练,得到主题-词的概率分布。通过主题-词中词的概率分布,计算词与词的相关度,提出通过词与词间相关度计算句子与句子间相似度的算法。对两个算法进行综合,得到最终的相似度算法。文中对FAQ进行整理,得到了FAQ问答系统的雏形。通过实验分析,说明相似度算法有很好的效果。
The data returned by the traditional search engine is too large, users cannot quickly find the answer they want sometimes. In this case,introduce FAQ system. How to find the best match in the FAQ system is the focus. An improved VSM model is presented in this pa- per. This new model is used in order to reflect the weight of the terms in question better. LDA, which was trained with documentation within the domain of computer malfunction generates a probability distribution of topic-term by which the relevance between words is calculated. Then the algorithm of calculating similarity between sentences by calculating relevance between words was presented. Com- bined with the above two algorithm, get the final similarity algorithm. FAQ is collected and rudiment of FAQ answering system is imple- mented in this paper. The algorithm used is proved well by the experiments.