在社区型问答服务中,存在大量的由用户生成的问题及答案,一方面用户可以通过发布新问题,等待其他用户的回答;另一方面用户可以通过搜索与当前问题相关或者相似的问题,从而得到相应的答案。随着社区型问答服务的发展,用户更加关注问题检索服务的质量,因此如何合理并有效地检索出与用户当前问题相关或相似的问题,成为社区型问答服务的核心任务。对社区型问答服务中用户问题的特点进行了分析,提出一种确定问题中词项重要性的方法,从而改进传统问题检索模型中计算当前问题和候选问题集之间相关度的方法,提高问题检索质量。实验证明文中的方法在MAP、MRR及R—precision三项指标中均有提高。同时,分析了影响词项重要度的实验特征,得出最优的特征集合。
There are millions of questions and answers which were built by users in community question answering(eQA) services. On the one hand, users could ask new questions and wait for answers from other users. On the other hand, users could search for relative and similar questions, then get the answers. As the boom of cQA service, users mostly focus on the quality of question search. Hence, how to retrieve relative and similar questions reasonably and effectively became the core task in cQA service. This paper analyzes the characteristics of questions in cQA services and presents a term weighting method, then modifies the traditional mechanism which computed the relations between user questions and candidate question set and improves question search result. Experimential result shows that the method proposed in this paper outperforms the basline in MAP., MRR and R - precision. Meanwhile, this paper analyzes the features and presents the optimal feature set.