论述了大数据处理技术的最新研究进展。首先,为了便于从整体上了解研究现状,从负载类型和数据类型两个角度对目前的大数据处理系统进行了分类;其次,深入介绍了批处理编程框架的研究进展,重点讨论了面向大规模图分析、分布式机器学习等应用领域的编程框架,包括计算特点、面临的挑战和设计原理等;最后,对大数据的研究热点和趋势进行了展望,指出异构硬件平台的并行训练、串行代码的自动化并行以及混合编程是未来大数据处理技术的研究热点。
The lated developments in the studies of big data processing are reviewed. Firstly, the existing big data pro- cessing systems are classified from the angles of workload type and data type. Then, the advances in research on programming frameworks for batch processing are described in detail, focusing especially on the programming frameworks for the application fields of large-scale graph computing, distributed machine learning, etc. , with the characteristics, challenges and design principles being discussed. Finally, the future research on big data process- ing is forecasted, and the conclusion that parallel training of heterogeneous hardware platform, automatic paralleling of serial codes and hybrid programming will become the focal points in this field is given.