数据科学以大数据为研究对象,而大数据对统计分析最直接的冲击莫过于数据收集方式的变革。同时统计分析的视野也不再局限于传统的属性数据,而是包括了关系数据、非结构、半结构数据等其他类型更丰富的数据。伴随着数据开放,数据库之间的关联信息的价值逐步得到体现。本文基于统计学的视角,分别从科学理论基础、计算机处理技术和商业应用等三个维度,研究了数据科学的统计学内涵,探讨了数据科学范式对统计分析过程的直接影响,以及统计学面临的机遇与挑战。
Big data is the key in data science. The direct impact of big data on statistical analysis is that it provides a new way of data collection. And the scope of statistics has broaden to include the relational data, unstructured data, semistructured data and other types of data, no longer limited to traditional attribute data. With the open data movement, the value of the linkage between the databases has been paid much more attention. In this paper, we study the statistical connotation of data science in three dimensions with statistical view, such as theoretical basis, computer sciences and business application. The impact of the paradigm of data science on the process of statistical analysis has been explored, and also the opportunity and challenge for statistics.