数据溯源描述了数据产生和随着时间推移而演变的整个过程,它的应用领域很广,包括增量视图维护、信任评估、概率数据库的查询评估等.本文主要研究面向Datalog查询的半环溯源计算问题.根据半环溯源模型的特点提出一种基于magic的溯源计算方法.之后,针对数据更新的频繁导致半环溯源计算问题,提出一种基于派生树结构的半环溯源计算方法.最后,为了解决半环溯源表示的冗余性尤其是递归导致的形式幂级数溯源形式,提出本质溯源表示形式,该溯源方法表示构成结果元组必不可少的派生路径.最后通过大量实验来验证了本文提出方法的可行性和有效性.
Data provenance describes about how data is generated and evolves with time going on, which has many applications, inclu- ding incremental view maintenance, trust evaluation, possibility database, etc. We study the semiring provenance for datalog. Firstly, we propose a magic approach to optimal the computational process of semring provenance. After, we analysis the structure characteristics of derived trees for semiring provenance and propose an optimization algorithm specialized in complicated data evolution. Then we de- scribe the essential provenance which is the core computational process for tuples of query results which can improve the efficiency of provenance computation and storage. And experiments show that the technologies in this paper have a better efficiency and scalability compared with existing approaches.