流处理是一种重要的大数据应用模式,在金融、广告、物联网、社交网络等众多领域得到了广泛应用.在流处理场景中,流数据的产生速度往往变化剧烈且不容易预测.这时,如果数据流量峰值超过处理系统的承载能力,可能使得系统运行缓慢甚至崩溃,导致处理作业失效;如果为了应对数据流量峰值而过度配置资源,则可能在系统轻载时产生不必要的浪费.为了解决流处理中负载和资源的匹配问题,流处理系统应该具有弹性可伸缩的能力,一方面以高效的方式组织运算资源;另一方面能根据数据流量的实时变化自动地调整资源使用量.然而,现有的流处理框架对于弹性可伸缩的支持尚很初步.介绍了一种基于Actor模型的弹性可伸缩的流处理框架eSault.eSault首先基于Actor模型将批量的处理单元进行分层管理,通过2层路由机制实现了对伸缩性的支持;在此基础上,设计一个基于数据处理延迟的过载判断算法和基于数据处理速度的轻载判断算法来指导系统对资源的有效使用,进而实现弹性可伸缩的流处理.实验结果表明:eSault具有较好的性能,而且能够很好地实现弹性可伸缩.
In the era of big data,stream processing has been widely applied in financial industry,advertising,Internet of things,social networks and many other fields.In streaming scenarios,the generation speed of stream data tends to be fluctuant and difficult to predict.If the streaming peak is larger than system capacity,the system may run slowly or even crash,which leads to job failure.If excessive resources are provided in case of streaming peak,there can be unnecessary waste under light load.In order to solve the matching problem between stream processing load and resources,stream processing system should be elastically scalable,which means that provided resources can be adjusted automatically according to the real-time change of stream flow.Although some researches have made great progress in stream processing,it is still an open problem that how to design an elastic scalable system.This paper introduces eSault,an elastically scalable stream processing framework based on Actor model.eSault firstly manages the processing units stratified hierarchically based on Actor model,and realizes scalability with two-layer routing mechanism.On this basis,eSault proposes an overload judgment algorithm based on data processing delay and light load judgment algorithm based on the data processing speed to efficiently allocate the resources,and achieve elastically scalable stream processing.Experiments show that eSault has good performance,and can achieve flexible scalability well.