识别搜索引擎用户的查询意图在信息检索领域是备受关注的研究内容.文中提出一种融合多类特征识别Web查询意图的方法.将Web查询意图识别作为一个分类问题,并从不同类型的资源包括查询文本、搜索引擎返回内容及Web查询日志中抽取出有效的分类特征.在人工标注的真实Web查询语料上采用文中方法进行查询意图识别实验,实验结果显示文中采用的各类特征对于提高查询意图识别的效果皆有一定帮助,综合使用这些特征进行查询意图识别,88.5%的测试查询获得准确的意图识别结果.
Identifying underlying user intents of search engine queries is a hotspot in the field of web information retrieval. An approach to identifying user intents of search engine queries is proposed based on features from various sources. Specifically, the query intent identification is regarded as a classification problem. The classification features are extracted from various sources including query texts, search engine feedbacks and query logs. The method is evaluated on the real web query data. The experimental results show that the exploited features are helpful to improve the identification performance. Furthermore, about 88.5% of the test queries can be correctly identified with the classification framework via combining all the features.