对信息检索系统返回结果相关度的改进,一直是信息检索领域重要的研究内容。本文首先引入查询词出现信息的概念,随后给出了查询词出现权重的形式化表示,进而将其与BM25模型结合起来。对于查询词出现权重的计算,本文采用了两种方法,即线性加权方法和因数加权方法。我们通过在GOV2数据集上的实验发现,无论哪种方法,通过加入查询词出现权重,都可以有效的改进检索结果的相关度。实验显示,对于TREC 2005的查询,MAP值的改进达到15.78%,p@10的改进达到3468%。本文所描述的方法已经应用到TREC 2009的WebTrack中。
Considerable research effort has been expended to improve the effectiveness of information retrieval systems.In this paper,we take advantage of information of term occurrence,present formula to compute the weight of term occurrence,and combined it with the BM25 model.To integrate weight of term occurrence into the BM25 model,we use two methods,namely,the linear weighting and the factor weighting.Through GOV2 experimental data set we found that, regardless of which method,by adding weights of query term occurrence,can effectively improve the relevance of search results.Experiments show that,for TREC 2005 queries,MAP values improved by 15.78%,p@10 improved by 34.68%. Methods in this paper have been applied to the Web Track in TREC 2009.