关键字检索是大多数普通用户用来查找信息的首选方式,概率XML数据是时下受到较多关注的不确定数据的一种表现形式.论文主要针对概率XML数据研究其关键字检索的方法.首先选取在确定XML数据上受到广泛认可的ELCA检索结果集,进而提出概率XML数据上的ELCA的结果集定义.其次,基于这样的结果集理论,给出在概率XML数据上进行ELCA的关键字检索的算法,并引入概率阈值的概念加以实现.最后利用实验数据证明了使用合成数据的检索算法具有效率和有效性.
Keyword search is the first choice for most ordinary users to search information, and probabilistic XML data is a form of uncertain data which get more attention at the moment. Keyword search on probabilistic XML data is focused in this paper. First, a popular result set ELCA on the XML data is selected. Then, the result set of probahilistic XML keyword search based on ELCA is defined. Secondly, the corresponding search algorithm based on such definition is presented and our proposed approach is realized. Finally, experiment shows that the search algorithm using synthetic data sets has the efficiency and effectiveness of our approaches.