不确定性数据的世系分析是基于数据产生和演变的过程来跟踪数据不确定性的来源.为了有效地描述数据间复杂的相关性及不确定性,并从理论上保证世系分析中概率计算的正确性,文中研究了基于贝叶斯网这一重要的概率图模型的不确定性数据世系表示方法.以世系的布尔公式和不确定性数据本身为出发点,提出了将布尔公式等价转换为贝叶斯网的方法,并讨论了相应的条件独立性质和概率语义.案例研究和实验结果表明,文中的方法为世系分析提供了一种有效性的、可扩展的数据相关性表示和概率计算框架.
Analyzing lineage(or called provenance) of uncertain data is to trace the origin of uncertainty based on the process of data production and evolution.To represent complex correlations and their uncertainties among uncertain data objects,and then guarantee the correctness of probability computations in lineage analysis theoretically,we study the method for representing lineages of uncertain data based on Bayesian network,an important probabilistic graphical model.Starting from the lineages' Boolean formula and the uncertain data,we propose the method to transform Boolean formulas into Bayesian network equivalently,and discuss the corresponding probabilistic semantics and properties.Case studies and experimental results show that the proposal in this paper provides an effective and extensible framework for representing data correlation and evaluating uncertainties in lineage analysis.