针对基于搜索日志的查询扩展方法所提取的查询扩展词受到词汇流行性影响较为严重,存在一段时期内知识覆盖有局限,导致缺失部分满足用户信息需求的搜索结果的问题,提出平衡流行性和相似性的混合式查询扩展方法.该方法先对用户搜索日志进行聚类,并进行用户查询词匹配,生成日志扩展词集;再利用本体生成用户查询词的扩展词分类集;基于这些词类集,计算日志扩展词集的语义覆盖度,对覆盖度不足的日志扩展词集利用证据理论的合并规则融合两扩展词集,获取高质量的扩展词.实验结果表明,该方法提高了搜索引擎的检索性能,满足了用户搜索信息的需求,提高了用户搜索满意度.
For the question that query expansion words extracted by query expansion method based on search log are seriously influenced by the popularity of the words,which leads to a period covered with limited knowledge,and cause the missing part of the search results to satisfy users' information needs,this paper studies a combined query expansion method for balanced epidemic and similarity. On one hand,this method clusters search logs,matching search queries and generate log based query expansions. Meanwhile,this method generates user query expansion word classification set using ontologies. Based on the classification set, the semantic coverage of the log based expansion set is calculated,and for expansion sets with low semantic coverage,ontology based expansion results are merged into log based expansion set using theory of evidence. The experimental results show that the method improves the retrieval performance of search engine,and it can meet the needs of the users to search information, which is able to improve the satisfaction of users.