构建结果子树是XML关键字查询处理的核心问题,其中求解与每个子树根节点相关的关键字节点是影响结果子树构建效率的重要步骤。针对已有方法不能正确求解基于ELCA(exclusive lowest common ancestor)语义的相关关键字节点(RKN,relevant keyword node)的问题,提出RKN的形式化定义及相应的RKN—Base算法。该算法通过顺序扫描每个关键字节点一次即可正确判断其是否为某个ELCA节点的RKN。针对RKN.Base不能避免处理无用节点的问题,提出一种优化算法RKN-Optimized,该算法基于每个ELCA节点求其RKN集合,从而避免了对无用节点的处理,降低了时间复杂度。最后,通过实验验证了所提算法的高效性。
Subtree results construction is a core problem in keyword query processing over XML data, for which computing the set of relevant keyword nodes (RKN) for each subtree's root node will greatly affect the overall system performance. Considering that existing methods cannot correctly identify RKN for ELCA semantics, the definitions of RKN and the RKN-Base algorithm were proposed, which can correctly judge whether a given node is an RKN of some ELCA node by sequentially scanning the set of inverted lists once. As RKN-Base cannot avoid processing all useless nodes, an optimized algorithm, namely RKN-Optimized, was then proposed, which computes RKN sets based on the set of ELCA nodes, rather than the set of inverted lists as RKN-Base does. As a result, RKN-Optimized avoids processing useless nodes, and reduces the time complexity. The experimental results verified the efficiency of the proposed algonthms.