在大数据环境下Web数据资源的开放性和多源性使得不同互联网平台提供的数据质量参差不齐,严重影响人们从互联网中有效准确地获取信息。为此,提出一种Web数据源质量评估方法。建立面向多源互联网平台的统一数据模型和数据质量标准模型,给出针对大数据全样本数据分析的质量标准度量和表示方法,并通过多维数据质量的综合评估实现Web数据源质量的统一度量。实验结果表明,该方法能全面度量互联网平台的数据质量,为用户提供准确高效的质量评价结果。
The irregularity of data quality from different Internet platforms /which is caused by openness and multisource,has affected negatively knowledge acquiring from Internet in big data environment.Aiming at this problem,this paper proposes a Web data source quality assessment method.It establishes a unified data model and data quality standard model for multi-source Internet platform,gives quality standards measurement and representation methods for full sample data analysis of big data,and achieves the unity of Web data source quality metrics by comprehensive assessment of multidimensional data quality.Experimental results show that this method can comprehensively measure data quality of Internet platforms provide accurate and efficient quality evaluation results for users.