在分析Boyer-Moore (BM)算法的基础上,提出了BM算法的一个新的变形.其基本思想是在算法的预处理阶段,对扩展模式串Pa建立好后缀规则,其中:P是模式串,a是字母表中的任一字符,既加大了已匹配后缀的长度,同时隐含了Sunday算法的坏字符规则,从而获得更大的窗口跳跃距离.理论分析证明,该算法具有线性最差时间复杂度和亚线性平均时间复杂度,空间复杂度为O(m(σ +1)).实验结果表明,该算法的实际性能与BM算法相比有明显改善,尤其适合小字母表的情形.
A new variant of Boyer-Moore (BM) algorithm was proposed on the basis of analyzing BM algorithm. The basic idea of the improvement was to form match heuristic ( i. e. good-suffix rule) for the expanded pattern Pa in preprocessing phase, where P was the pattern and a was an arbitrary character that belonged to the alphabet, so both to increase length of the matched suffix and to imply Sunday's occurrence heuristic ( i. e. bad-character rule), therefore a larger shift distance of scanning window was obtained. The theoretical analyses show that the improvement has linear time complexity even in the worst case and sublinear behavior on the average case, and space complexity of O( m(σ + 1) ). The experimental results also show that implementation performance of the improved one is significantly improved, especially in the case of small alphabet.