在分布式检索中,基于主题的语言模型集合选择方法首先引入RelevanceModel计算用户查询和信息集合中文档的相似度,在此基础上通过文本聚类得到集合中文档的主题信息,加入语言模型计算得到各个信患集合的查询相关度排名,以此完成集合选择.实验表明,与CORI、CRCS和基于传统语言模型的集合选择算法相比,该方法的检索效果得到了显著提高.
The cluster-based language model for resource selection in distributed information retrieval firstly uses the relevance model to get the probabilities of relevance between documents in the collection and a given query, and then the whole collection is clustered into several clusters, each of which is used to construct precisely language model. Finally, databases are selected by the similarity to the query according to the estimated language model. The experimental result shows that our approach consistently improves retrieval performance over CORI, CRCS and traditional language model.