大数据的来临增强了可视化的重要性。可视化分析挖掘人类对于信息的认知能力与优势,将人、机有机融合,借助人机交互高效洞悉大数据背后的信息与规律,是大数据分析的重要方法。针对大数据数据量大、维度高、多来源、多形态等特点论述了大规模数据、流数据、非结构和异构数据的可视化方法。首先讨论了大规模数据的可视化技术:1)采用分而治之的原则将大问题分解成较小的任务并采用并行处理的方式解决以提高处理的速度;2)通过聚合、采样、多分辨表示的方法进行数据约简;3)针对高维数据选择若干个视图,在多个角度下生成不同的可视化结果。然后针对监控型、叠加型两类流数据探讨了流数据的可视化过程。最后阐述了非结构化数据以及异构性数据的可视化技术。总之,可视化能够克服计算机自动化分析方法的劣势与不足,整合计算机的分析能力和人们对信息的感知能力,有效地洞悉大数据背后的信息与智慧,但其理论研究成果也非常有限,同时面临着数据规模大、动态变化、维度高、多源异构等方面的挑战,这些也逐渐成为今后的大数据可视化研究的热点与方向。
The advent of big data era elicits the importance of visualization. As an import data analysis method, visual analytics explores the cognitive ability and advantages of human beings, integrates the abilities of human and computer, and gains insights into big data with human-computer interaction. In view of the characteristics of large amount of data, high dimension, multi-source and multi-form, the visualization method of large scale data was discussed firstly: 1) divide and rule principle was used to divide big problem into a number of smaller tasks, and parallel processing was used to improve the processing speed; 2) the means of aggregation, sampling and multi-resolution express were used to reduce data; 3) multi-view was used to present high dimensional data. Then, the visualization process of flow data was discussed for the two types of flow data, which were monitoring and superposition. Finally, the visualization of unstructured data and heterogeneous data was described. In a word, the visualization could make up for the disadvantages and shortcomings of computer automatic analysis, integrate computer analysis ability and human perception of information, and find the information and wisdom behind big data effectively. However, the research results of this theory are very limited, and it is faced with the challenge of large scale, dynamic change, high dimension and multi-source heterogeneity, which are becoming the hot spot and direction of large data visualization research in the future.