随着数据科学的兴起,数据已成为一种重要战略资源。在大数据的环境下,数据正以前所未有的速度不断地增长、累积、流动、复制和分发,由于数据处于动态演化的过程,验证原始出处变得更为困难,更容易导致数据质量问题。因此,加强数据起源的研究势在必行,文章通过对数据起源相关概念、含义、模型、方法、技术、系统和应用等方面的系统梳理,对其进行综述研究,并指出在大数据环境下对于移动端与物联网中的数据起源问题的研究是一个极具挑战与重大现实意义的科学问题。
With the fast development of data science,data becomes a key strategic resource.In the environment of big data,data is increasing,accumulating,sharing,duplicating and distributing faster than ever.Since data is always in a dynamic process,how to determine the origin of data becomes more difficult,which is easily leads to data quality problems.As a result,it is necessary to carry on the research on data provenance.Based on the analysis of concept,meaning,model,method,technology,system and application of data provenance,this paper carries on the literature review of data provenance,and points out that the study of data provenance issues in mobile terminal and internet of things under the big data era is a scientific problem of high challenges and significance.