近年来查询扩展方法已被证明能有效地提升专利检索的平均性能,而大多数查询扩展方法仅使用实验数据集进行扩展词选择,很少有研究将外部信息源利用于专利检索,提高检索准确率。因此,本文在实验数据集基础上,同时采用一种利用外部资源的方法提升专利检索性能。该方法利用谷歌搜索引擎优化查询扩展方法的性能,并采用排序学习方法LambdaMART方法融合不同查询扩展方法,对信息源中的不同文本域进行加权处理,从而提升专利检索性能。在TREC数据集上的实验结果表明,基于本文所采用的信息资源进行查询重构的方法有效地提升了专利检索的性能。
Query expansion methods has been proven to be effective to improve the average performance of patent retrieval, and most query expansion methods use a single source of information experimental data set for query expansion term selection. In contrast, in this paper, we propose a method which exploits external resources for improving patent retrieval besides using experimental data set. We present a learning to rank framework that optimizes the combination of information sources used for effective query expansion terms. The Google search engine is used as external resources to enhance the performance of query expansion methods. We use the learning to rank method LambdaRank to combine approaches to improve patent retrieval by combining different query expansion methods with different text fields weighting strategies from information resources. Experiments on TREC data sets have shown that our method for query formulation is found to be effective to improve patent retrieval performance.