现有排序学习算法忽视了查询之间的差异,在建立排序模型的过程中等同对待训练样本集中的所有查询及其相关文档,影响了排序模型的性能.文中描述了查询之间的差异,并在训练过程中考虑查询之间的差异,提出了一种基于有监督学习的多排序模型融合方法.这种方法首先使用每一个查询及其相关文档训练出子排序模型,并将每一个子排序模型的输出转化为体现查询差异的特征数据,使用监督学习方法,实现了多排序模型的融合.更进一步,针对排序问题的特性,文中提出了一种直接优化排序性能的融合函数融合子排序模型,使用梯度上升方法优化其下界函数.文中证明了直接优化排序性能的融合函数融合子排序模型的性能优于子排序模型线性合并的性能.基于较大规模真实数据应用的实验结果表明,直接优化性能指标的多排序模型融合方法可以比传统排序学习模型具有更好的排序性能.
In ranking for document retrieval,queries often vary greatly from one to another.Most of the existing approaches treat the losses from different queries as the same.We find out that using a supervised rank aggregation function could further improve the ranking performance.In this paper,the differences among queries are taken into consideration,and a supervised rank aggregation framework based on query similarity is proposed.This approach sets up a number of base rankers based on each query and its relevant documents,and then employs a supervised aggregation function to train the weights for these base rankers.We propose an aggregation function which is directly optimizing performance measure NDCG,referred to as RankAgg.NDCG.We prove that RankAgg.NDCG can achieve better performance than the linear combination of the base rankers.Experimental results performed on real world datasets show our approach outperforms conventional ranking approaches.