不确定数据管理逐渐成为一个重要的研究方向。作为网络交换重要标准的XML数据的不确定管理也成为一个研究热点。基于关键字的概率XML检索是其中一个重要的分支。目前对于概率XML关键字检索的研究,都只考察了结点之间的独立(IND)关系和互斥(MUX)关系。由于更普遍的结点依赖关系在表述和计算上的复杂性,较少有工作讨论。文中讨论概率XML模型PrXML{exp,ind,mux}中基于SLCA语义的关键字过滤。这种模型中通过EXP结点描述更普遍的结点依赖关系。文中在定义了子树中关键字概率分布表狋犪犫及其相关的运算后,分别给出了模型中不同类型结点关键字概率分布表的计算方法,并给出了不需要构造可能世界直接求解SLCA结点概率的算法。文章通过实验评估了算法的特性和性能。
Uncertain data management is becoming an important research focus. Uncertainmanagement of XML data which is the main store and exchange standard of web data is naturallybecoming a hot point. One of the branches is keyword-based search over probabilistic XML. Inrecent work of keyword search over probabilistic XML, only the independent and the mutually-exclusive relationships among sibling nodes have been discussed. Because of the complexity ofrepresentation and computation, more general relationship among sibling nodes has got littleattention up to now. This paper addresses the problem of keyword filtering over probabilisticXML data model PrXML^{exp,ind,mux}. In the model, exp node is used to represent more generalrelationship among sibling nodes, tab is defined as keyword distribution probability table of onesubtree. The dot product, Cartesian product, and addition operation of tab are also defined.Then the computation of different type of nodes ' tab are given. Furthermore, an algorithm ofhow to obtain SLCAs and the probability of being a SLCA node is also given without generatingpossible worlds. Finally, the features and efficiency of our method are evaluated with extensiveexperimental results.