目前可扩展标示语言( XML)关键字查询大多是基于最小公共祖先( LCA)语义子树产生查询结果,而未能加入除LCA语义子树之外与用户查询意图相关的结果。为解决该问题,提出一种基于扩展查询表达式的XML关键字查询方法。将用户查询日志作为查询扩展统计模型,对其进行统计分析,并结合最佳检索概念判断是否需要扩展查询表达式。使用XML TF-IDF方法计算候选属性的权重,根据初检结果的上下文信息,利用聚类方法获得与查询意图最相关的扩展查询关键字,从而扩展查询表达式。实验结果表明,与XSeek和基于语义词典的查询扩展方法相比,该方法的平均F度量值分别提高了7%和17%,具有较高的查询质量。
Most existing eXtensible Markup Language ( XML ) keyword searches are based on Lowest Common Ancestor( LCA) semantics tree to generate search result,but they do not consider the data which is not included in LCA semantics tree while is relevant with user search intention. To solve this problem,an XML keyword query method based on extended query expression is proposed. The query expansion statistical model is based on user query log. Through analyzing query log and combined with optimal retrieval concept,it can judge whether the query expression should be expanded. After that,an XML TF-IDF method is employed to calculate the weight of candidate attribute. According to the context information and using cluster method,it gets the query expression keywords which are most relevant with search intention. Then the expanded query expression is generated. Compared with XSeek and semantics dictionary based query expression method,experimental result shows this method can improve the query quality by average 7% and 17% in F-measure respectively.