东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

机器学习的查询扩展在博客检索中的应用

ISSN号：1003-0077
期刊名称：中文信息学报
时间：0
页码：99-102
语言：中文
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]复旦大学计算机科学与技术学院,上海200083
相关基金：基金项目：国家自然科学基金资助项目（60673038,60503070）
相关项目：中文文本情感倾向挖掘技术研究

关键词：计算机应用, 中文信息处理, 信息检索, 查询扩展, 机器学习, computer application, Chinese information processing, information retrieval, query expansion, machine learning

中文摘要：

该文介绍一种新的查询扩展方法，该方法结合了查询扩展技术和机器学习理论。通过机器学习的方法挑选出查询扩展词，以此提高检索结果的性能。对于输入的查询项，首先通过伪反馈技术生成候选扩展词集合，然后使用支持向量机对输入的候选词评分，挑选得分较高的候选词和原始查询项组成一个新的查询项。由于训练这个支持向量机的训练数据较难获得，我们利用评测会议的检索结果和检索工具自动地生成训练数据。这套查询扩展方法的优点在于通过对训练语料的学习．能够对候选扩展词作出更合理的选择。在TREC评测会议组织的观点检索任务中，相对于不采用任何扩展技术的基准系统，该方法提高了MAP指标33．1％。

英文摘要：

A novel query expansion approach is presented in this paper, which applys the machine learning technique to the query expansion It improves the retrieval performance by training a machine learning modular to predict and select the query expansion words. With the pseudo-relevance feedback, a set of candidate expansion words are generated for a certain topic. Then a Support Vector Machine （SVM） judges on these candidate words and forms an optimized query by selecting the top candidate words. To train such a SVM for query word judgment is difficult because the training data set is unavailable. This issue is resolved by generating the training data set via the retrieval results and evaluation tools available. In the opinion retrieval task of BLOG TRACK held by the TREC conference, we use this query expansion method to improve the Mean Average Precision （MAP） by 33.1% compared with the baseline result.

同期刊论文项目