基于文摘的检索模型是基于一个假设,即出现在文摘中的词要比未出现在文摘中的词更能表达文章的主题,因此对检索贡献更大.提出了两个基于文摘的语言检索模型,一个是用文摘模型代替文档模型直接检索文件(SQL),另一个是用文摘模型平滑文档模型(SBDM).在TREC数据集上的实验表明,该模型能够提高检索的性能.其中,SBDM的性能一致接近或优于传统的标准文档查询相似模型.有两个方面的贡献,一方面提出了面向检索的文摘抽取方法并考察了这些文摘方法对检索性能的影响;另一方面提出了新的检索模型,即基于文摘的检索模型.
Summary-Based retrieval is based on the hypothesis that terms in summary should be more important than other terms not in summary. Recent developments in the language modeling approach to information retrieval have motivated the study of this problem within this new retrieval framework. In the proposed research, two approaches to Summary-based retrieval, namely ranking documents directly (SQL) and smoothing documents with summaries (SBDM) are investigated. Results on TREC collections show that, with the proposed models, summary-based retrieval models can perform consistently across collections and significant improvements over document-based retrieval can be obtained. There are two main contributions in this paper. On the one hand, summarization method of retrieval-oriented is examed and effect of this method on information retrieval. On the other hand, the new retrieval model for summary-based information retrieval models is proposed.