针对深度报文检测中正则表达式模式匹配的状态表爆炸问题,提出并实现了一种集合交割的预编码方法(SI—precode),在正则表达式转换成DFA前对所有输入符号进行预编码,通过压缩输入,减少FSM中输入符号的种类,从而压缩状态转移表的空间.证明了预编码生成的状态机的正确性及其与原状态机的同态性.采用L7-filter模式进行实验表明SI-precode不仅提高了正则表达式的编译速度,针对单模式状态机,其状态转移表空间比不进行预编码压缩了87%~97%,50个模式的多模式状态机可压缩59%.预编码在软硬件结合体系结构下进行协议识别时不会对性能造成影响;对纯软件结构性能降低2%~4%.
Recent advances in network packet processing focus on payload inspection for applications that include content-based billing, layer-7 switching and Internet security. Most of the applications in this family compile the fingerprints to finite state machine (FSM), and then process packets based on state transition table. There are two kinds of FSM. deterministic finite automaton (DFA) and nondeterministic finite automaton (NFA). DFA is fast but needs more memory, while NFA needs little memory but is slow. There is little time for every packet to be processed in payload inspection, so DFA is always preferred. To solve the state transition table explosion problem of regular expression pattern matching using DFA, a set intersected precode (SI-precode) method is introduced. SI-precode codes all input symbols before the conversion from regular expressions to DFA, and the space of FSM state transition table is then reduced by compressing the input symbols. Experiments based on L7-filter are employed to show that the memory space would save 87% to 97% for single pattern FSM, and 59% for multiple-pattern FSM with 50 patterns. SI-precode doee not affect the performance in the case of the architecture combining software and hardware. For pure software implementation, the performance decreases 2% to 4 %.