正则表达式匹配在很多网络安全领域起着非常重要的作用。确定性有限自动机(DVA,deterministic finite automatonl具有线速稳定的匹配性能,因而更适合在高速网络环境下执行正则表达式匹配。但DFA可能由于状态膨胀而占用巨大的内存空间。作为状态膨胀问题的一种经典解决方案,i-DFA在大幅降低内存开销的同时,还能保证最差匹配性能。然而,已有方法构造i-DFA时在时间和空间上都是非常低效的。基于状态分组的思想,提出了一种高效的i-DFA构造方法。进一步地,对状态分组进行了形式化描述,并证明了获得最优状态分组是NP困难的,并基于局部搜索的思想提出了一种近优的状态分组算法。实验结果表明,相比经典的i-DFA构造方法,所做的工作在时间和空间上都有极大的改进:i-DFA的状态规模可能只是已有方法的2/3,而构造i-DFA所用时间仅是已有方法的1/16。
Regular expression matching plays an important role in many network and security applications. DFA is the preferred representation to perform regular expression matching in high-speed network, because of its high and stable matching efficiency. However, DFA may experience state explosion, and thus consume huge memory space. As a classi- cal solution for the problem of state explosion, i-DFA can reduce the memory consumption significantly and guarantee the worst matching performance at the same time. However, prior methods are inefficient in both time and space during the construction of i-DFA. An efficient i-DFA construction algorithm based on the idea of state grouping was proposed. Furthermore, a formal description for the problem of state grouping was given, and it was proved that it was NP-hard to get the best state grouping result. Thus, based on local search strategy, a near-optimal algorithm was introduced to divide states into different groups. Compared with the classical construction method, the significant improvement in both time and space is achieved; the i-DFA of the proposed method may have 2/3 states as that of prior method and the proposed i-DFA is constructed with only 1/16 time of it.