文本文档信息检索中检索质量不高的一个主要原因是用户难以提出准确的描述查询意图的查询表达式.而XML文档除了具有文本文档的内容特征外,还具有结构特征,导致用户更难以提出准确的查询表达式.为了解决这一问题,提出一种基于相关反馈的查询扩展方法,可以帮助用户构建满足查询意图的“内容+结构”的查询表达式.该方法首先进行查询词扩展,找到最能代表用户查询意图的权重扩展查询词:然后在扩展查询词的基础上进行结构查询扩展;最终形成完整的“内容+结构”的查询扩展表达式.实验结果表明,与未进行查询扩展相比,扩展后prec@10和prec@20的平均准确率提高30%以上.
The main reason of low precision in information retrieval (IR) is that it is difficult for the users to submit a precise query expression for their query intensions. Furthermore, XML documents have characteristics not only in the eontent, but also in its structure. Therefore it is more difficult for users to submit precise query expressions. In order to solve this problem, this paper puts forward a new query expansion method based on relevance feedback. It can help users to construct a content and structure query expression which can satisfy users' intentions. This method includes two steps. The first step is to expand keywords for finding the weighted keyword which can represent the user's intentions. The second step is structural expansion based on the weighted keywords. Finally a full-edged content-structure query is formalized. Experimental results show that the method can obtain better retrieval results. The average precision ofprec@10 and prec@20 is 30% higher than the original query.