大数据环境下的数据流处理实时性要求高,数据计算要求持续性和高可靠性。分布式数据流处理系统(DDSPS)能解决大数据环境下的数据流处理问题,它除具备分布式系统的可扩展性和容错性优势外,还具有高的实时处理能力。详细介绍了组成基于大数据的分布式数据流处理系统的四个子系统及其关键技术,讨论和比较了各个子系统的不同技术方案;同时介绍一种分布式拒绝服务(DDo S)攻击检测数据流处理系统结构案例,其研究内容能为大数据环境下的数据流处理理论研究和应用技术开发提供技术参考。
In the big data environment, the real-time processing requirement of data stream is high, and data calculations require persistence and high reliability. Distributed Data Stream Processing System (DDSPS) can solve the problem of data stream processing in big data environment. Besides, it has the advantages of scalability and fault-tolerance of distributed system, and also has high real-time processing capability. Four subsystems and their key technologies of the DDSPS based on big data were introduced in detail. The different technical schemes of each subsystem were discussed and compared. At the same time, an example of data stream processing system structure to detect Distributed Denial of Service (DDoS) attacks was introduced, which can provide the technical reference for data stream processing theory research and application technology development under big data environment.