该文介绍一种新的查询扩展方法,该方法结合了查询扩展技术和机器学习理论。通过机器学习的方法挑选出查询扩展词,以此提高检索结果的性能。对于输入的查询项,首先通过伪反馈技术生成候选扩展词集合,然后使用支持向量机对输入的候选词评分,挑选得分较高的候选词和原始查询项组成一个新的查询项。由于训练这个支持向量机的训练数据较难获得,我们利用评测会议的检索结果和检索工具自动地生成训练数据。这套查询扩展方法的优点在于通过对训练语料的学习.能够对候选扩展词作出更合理的选择。在TREC评测会议组织的观点检索任务中,相对于不采用任何扩展技术的基准系统,该方法提高了MAP指标33.1%。
A novel query expansion approach is presented in this paper, which applys the machine learning technique to the query expansion It improves the retrieval performance by training a machine learning modular to predict and select the query expansion words. With the pseudo-relevance feedback, a set of candidate expansion words are generated for a certain topic. Then a Support Vector Machine (SVM) judges on these candidate words and forms an optimized query by selecting the top candidate words. To train such a SVM for query word judgment is difficult because the training data set is unavailable. This issue is resolved by generating the training data set via the retrieval results and evaluation tools available. In the opinion retrieval task of BLOG TRACK held by the TREC conference, we use this query expansion method to improve the Mean Average Precision (MAP) by 33.1% compared with the baseline result.