人们对大数据的认识已从"3Vs"(Volume—大容量;Variety—多样性;Velocity—处理实时性)、"4Vs"("3Vs"与Value—价值)、到现今的"5Vs"("4Vs"与Veracity—真实性).在此背景下,首先分析过程工业大数据的"5Vs"特性;接下来,综述现有数据建模方法,并结合过程工业大数据特有性质(包括:多层面不规则采样性、多时空时间序列性、不真实数据混杂性)论述现有数据建模方法应用于工业大数据建模时的局限;最后,探讨过程工业大数据建模有待研究的问题,包括:1)多层面不规则采样数据的潜结构建模;2)用于事件发现、决策和因果分析的多时空时间序列数据建模;3)含有不真实数据的鲁棒建模;4)支持实时建模的大容量数据计算架构与方法.
The understanding of big data goes through three stages, i.e., "3Vs"(Volume, variety and velocity), "4Vs"("3Vs" and value), and "5Vs"("4Vs" and veracity). In the era of big data of process industries, the "5Vs" characteristics of industrial big data are analyzed. After that, the existing methods on data modeling are reviewed while the corresponding limitations are analyzed under industrial big data circumstances with specific characteristics, i.e., multi-layer irregularly sampling, multiple temporal and spatial time series, and non-veracity with outlier. Finally, the perspectives on industrial big data modeling are discussed, including: i) latent structure modeling of multi-layer irregularly sampled big data; ii)multiple temporal and spatial time-series data modeling for event discovery, decision-making, and causality analysis; iii)robust modeling of data with non-veracity samples; and iv) data-friendly system architecture and method towards big data real-time modeling.