面向跨语言信息检索任务提出了一个引入双语主题的跨语言伪相关反馈模型.将潜在狄利克雷分配模型扩展为能同时对双语文档建模的主题模型,其中每个主题既可以生成源语言词项,也可以生成目标语言词项;为查询式选择相关的双语主题,并利用其中的相关词项对查询翻译进行优化扩展,获得用于二次检索的新查询.实验结果表明,基于该反馈模型的跨语言检索效果优于其他基于单语主题模型和向量空间模型等反馈策略.
A cross-lingual pseudo relevance feedback model based on bilingual topics is proposed for cross language information retrieval task. The latent Dirichlet allocation (LDA) model is extended to the bilingual topic model, each topic could generate a source language token and a target language token. A strategy on how to choose topics and words for cross language query expansion is given, and the secondary retrieval is performed on the basis of the refined query translation. Experiments show that this model out- performs monolingual LDA-based feedback method as well as classical techniques based on vector space model.