随着大数据处理的深入发展,系统单位时间内产生的数据日趋庞大,数据间的关联关系日趋复杂,这使得传统的“存储.查询”或者“发布一订阅”的方式无法很好地满足诸如故障监控、股票分析、医疗及生命保障等对大数据具有实时处理需求的系统.复杂事件处理技术实现的是将用户对特定的事件序列的查询需求映射到特定识别结构上.该结构从多个持续的数据流中分析并提取满足特定模式的事件序列.该技术能够很好地支持对大量数据进行实时在线分析.但由于在数据处理的过程中,系统不可能预置全部的查询语义浒多系统在使用过程中会需要使用新的语义,以查询新产生的模式.因此,一种支持扩展的语义的复杂事件处理模型是非常必要的.同时,现有的复杂事件处理模型仅针对某几类特定的查询进行描述以及优化,对整体模型缺乏统一描述,导致许多模型在多规则复杂查询的情况下效率欠佳.针对上述问题,提出了基于算子的可扩展复杂事件处理模型.该模型能够良好地支持现有的各类查询语义,具有较快的识别速度.基于该模型的形式化描述,对系统在识别过程中的性能消耗进行了详细分析,给出了模型构造最优算法.通过实验验证了算子模型优化方案的正确性.实验结果表明,经过优化后的树结构事件处理速度比开源复杂事件处理引擎Esper快3倍以上.
With the development of big-data computing, the system generated data becomes larger and more complex. Yet systems like fault monitoring, stock analyzing and health-care require processing these data in nearly real-time. The original data processing methods such as "save-query" and "publish-scribe" cannot handle the large volume of data in that speed. Complex event processing (CEP) is a data processing scheme that executes the user's real-time queries. It continually takes the high volume of raw data input and produces output for the corresponding data stream according to the queries. However in some practical environments, the data from system may generate many new patterns, and the CEP system cannot prepare for each of them. Consequently, an extendable CEP system is needed. Existing CEP work mainly focus on several special types of queries without a high level overview, therefore cannot easily guarantee the overall performances of the system. Though the NFA model poses a universal semantic, the sealability of the NFA model is still under discussed. To address these defects, an operator-based complex event processing model is proposed to support operator extension. In addition, a detailed analysis is conducted on time consumption of operator-based model and an optimal algorithm is presented. Finally, the correctness of optimization solutions is verified by experiments. Contrast experiments show that the optimized tree model is three times faster than onen-source Droiect Esoer.