东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于VSM和LDA模型的FAQ问答系统

ISSN号：1673-629X
期刊名称：《计算机技术与发展》
时间：0
分类：TP31[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]安徽大学计算机科学与技术学院,安徽合肥230601
相关基金：安徽省自然科学基金资助项目（11040606M133）

关键词： VSM, 相似度计算, LDA(Latent, DIRICHLET, Allocation), 主题-词分布, VSM, similarity calculation, LDA （ Latent Dirichlet Allocation） , topic-term distribution

中文摘要：

传统的搜索引擎返回的数据太过庞大，很多情况下用户不能快速地找到自己要的答案。在这种情况下，文中引入FAQ系统。FAQ中如何找到最佳匹配答案，是文中的研究重点。改进了传统的VSM模型，使得它能更好地体现问题中词的权重。重点引人了LDA模型，并用计算机故障领域内的文档资料对它进行训练，得到主题-词的概率分布。通过主题-词中词的概率分布，计算词与词的相关度，提出通过词与词间相关度计算句子与句子间相似度的算法。对两个算法进行综合，得到最终的相似度算法。文中对FAQ进行整理，得到了FAQ问答系统的雏形。通过实验分析，说明相似度算法有很好的效果。

英文摘要：

The data returned by the traditional search engine is too large, users cannot quickly find the answer they want sometimes. In this case,introduce FAQ system. How to find the best match in the FAQ system is the focus. An improved VSM model is presented in this pa- per. This new model is used in order to reflect the weight of the terms in question better. LDA, which was trained with documentation within the domain of computer malfunction generates a probability distribution of topic-term by which the relevance between words is calculated. Then the algorithm of calculating similarity between sentences by calculating relevance between words was presented. Com- bined with the above two algorithm, get the final similarity algorithm. FAQ is collected and rudiment of FAQ answering system is imple- mented in this paper. The algorithm used is proved well by the experiments.

同期刊论文项目