针对有限状态自动机DFA构造过程中出现状态爆炸导致存储空间大、匹配效率低等问题,提出一种基于规则分组及状态边压缩相结合的正则表达式引擎优化算法GCFA,通过将规则基于关联性进行分组,对各个分组所构造的联合DFA采用存储连续字符的范围代替单一字符以达到减少存储空间的目的。实验结果证明,与标准DFA构造算法相比较,GCFA算法对状态转移边的压缩率达到98%,与经典改进算法相比较,降低2个数量级的存储空间。
To solve the problems that state explosion, large storage space and low matching efficiency usually happen in the process of constructing the traditional finite state automaton such as NFA.DFA, proposes the regular expression engine optimization algorithm GCFA based on the combination of grouping rules and compressing the state transition edges. Though the analysis of the rules, combined the methods of grouping rules based on relevance and storing the contiguous characters instead of single ones when constructing the united DFAs in order to achieve the purpose of compacting the state transition edges and reduce the storage space. The results of experiments show that, compared with the standard DFA algorithm, GCFA makes the state transition edge's compression rate reach at 98%, and compared with the other classic construction algorithms of DFA, GCFA's storage space is reduced by 2 orders of magnitude.