世系描述了数据产生、并随时间推移而演变的整个过程,它的应用领域很广,包括数据质量评价、数据核查、数据恢复和数据引用等.数据世系大致可分为不同数据源之间的数据演化过程和同一数据源内部的数据演化过程,即模式级和实例级数据演化过程.文中以模式级和实例级数据世系的表示、查询为主线综述数据世系的研究进展.模式级世系部分主要介绍了查询重写和模式映射的世系追踪技术,实例级世系部分则从关系型数据、XML数据、流数据三方面总结了新近的研究进展.文中还综述了跟踪不确定性数据及其演化过程的研究进展.最后,列举了数据世系管理的应用,并讨论了世系分析研究面临的挑战及未来的研究方向.
The data provenance describes about how data is generated and evolves with time going on,which has many applications,including evaluation of data quality,audit trail,replication recipes,data citation,etc.Generally,the data provenance could be recorded among multiple sources,or just within a single data source.In other words,the derivation history of data could take place either in schema level,or in instance level.This paper surveys the researches about presentation and query of data provenance both in schema level and instance level.For the schema level,the focus is on query rewriting and schema mappings,and for the instance level,the focus includes relational data provenance,XML data provenance,streaming data provenance.Moreover,the research efforts of uncertain data provenance to track the derivation of data and uncertainty are also summarized.Finally,this paper lists applications of the data provenance,discusses the main challenges,and points out some research issues in future.