针对信息检索查询所使用的词可能与文档集中使用的词不匹配从而影响检索效果这一信息检索关键问题,提出了一种基于上下文的查询扩展方法,该方法根据查询的上下文信息对扩展词进行选择,同时考虑到扩展词与整个查询句以及与查询词的位置关系.在TREC信息检索测试集上进行的实验表明,相对于通常简单的语言模型,方法取得了5%~19%的提高.与流行的基于伪反馈的查询扩展方法相比,提出的方法也具有相当的平均准确率.
The effectiveness of information retrieval fIR) systems is influenced by the degree of term overlap between user queries and relevant documents. Query-document term mismatch, whether partial or total, is a fact that must be dealt with by IR systems, query expansion (QE) is one method for dealing with term mismatch. Classical query expansion techniques such as the local context analysis make use of term co-occurrence statistics to incorporate additional contextual terms for enhancing passage retrieval. However, relevant contextual terms do not always co-occur frequently with the query terms and vice versa. Hence the use of such methods often brings in noise, which leads to reduced precision. On the basis of analyzing the process of producing query, the authors propose a new method of query expansion on the basis of context and global information. At the same time, the expansion terms are selected according to their relation with the whole query. Additionally, the position information between terms is considered. The experiment result on TREC data collection shows that the method proposed outperforms the language model without expansion by 5%-19%. Compared with the popular approach of query expansion, pseudo feedback, the method has the competitive average precision.