通常认为大数据是一个现有技术难以处理的复杂而庞大的数据集,这将导致一个谬误的出现:大数据都不能被处理,能处理的都不是大数据。显然,如何定义大数据是一个问题。分析了已有的大数据定义和现象,发现数据、技术和应用是大数据的三要素,定义大数据是为决策提供服务的大数据集、大数据技术和大数据应用的总称。其中,大数据集是指一个决策问题所用到的所有可能的数据,而不是个领域的所有数据。还给出了大数据应用遇到的问题及技术挑战,并指出大数据未来的研究方向。
Generally, big data is regarded as a term about data sets so large or complex that conventional data technologies cannot handle. This statement of big data leads to confusion: none of big data has been handled by existing data technologies; or none of current successful data applications can be called as big data. Therefore, what is the best way to define big data becomes a problem. Data, technology, and application were regarded as three associated key factors of big data by analyzing the state-of-the-art of big data. A comprehensive definition on big data was defined as the umbrella of big data set, big data technology, and big data application. Here, big data set means all data that can be acquired and were related to one decision-making application instead of all data in an area or an enterprise. In addition, the issues in big data applications and the main challenges in big data technologies were discussed. Finally, the future directions of big data research were presented including data science and the technologies of big data reservation and development.