对第一轮检索的结果文档进行重新排序,以提高顶端结果的准确率,一直是信息检索研究中的基础和关键热点问题。文章在考虑文档与文档的基础上,充分考虑了文档与关键词项以及词项与词项之间的多种关系,提出了一种基于流形学习的检索结果重排序的方法。将文档-文档,文档-关键词项,以及词项-词项这三种关系利用流形学习模型进行融合,然后通过正则化框架,在第一轮检索结果分数的基础上,进行文档重排序。在CLEF数据集上进行的实验表明,与基于图的文档重排序,基于LDA模型的文档重排序等方法相比,文中提出的方法可以更好地提高检索准确率。特别是在奥地利图书馆数据集中,采用MRR评估方法,文章所提出方法的准确率比表现最好的基线系统提高了11.78%,比第一轮检索结果提高了33.46%。
Document re-ranking is one of hot research areas in Information Retrieval so as to improve precision of top-ranked documents based on the first round retrieval results.We take the relationships between documents,between words in documents,between documents and words into consideration and propose a manifold-learning method for document re-ranking.The method incorporate the relationships between documents,between words in documents,between documents and words by a manifold-learning model,and then integrate them into a normalized framework to re-rank documents based on the initial results.The experiments are conducted on CLEF dataset.Comparing with representative baselines such as those based on graphs or the LDA model,this method can better improve precision.The method achieves 11.78% higher precision than the best performed baseline and 33.48% higher precision than the initial ranking results measured by MRR in Austrian National Library dataset.