东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于匹配区域特征的相似字符串匹配过滤算法

期刊名称：计算机研究与发展. 47(4). 663-670, 2010 (EI收录)
时间：0
分类：TP391.3[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]湖南大学计算机与通信学院,长沙410082
相关基金：国家“九七三”重点基础研究发展计划基金项目（2006CB303000）;国家自然科学基金重点项目（60736016）;国家自然科学基金项目（60573045,60873198,60973113,60973128）;国家“九七三”重点基础研究发展计划基金前期研究专项项目（2009CB326202）;高等学校博士学科点专项科研基金项目（20050532007）
相关项目：基于信息隐藏的传感器网络安全研究

关键词：相似字符串匹配, 过滤算法, 匹配区域特征, 过滤效率, q-gram, approximate string matching, filter algorithm, match-region feature, filtration efficiency, q-gram

中文摘要：

相似字符串匹配过滤算法因其适合大库查找而被广泛应用，为通过提高过滤算法的过滤效率加快匹配速度，提出一种基于匹配区域特征的过滤算法．该算法将模式串和文本串分割成固定长度为kq＋1的逻辑块，并从各块中提取了2个新的匹配区域特征：q-gram命中的均匀性和q—gram有效命中的区域性．新算法利用这些新特征优化了传统过滤标准，提高了算法的过滤效率；并改进了QUASAR中基于分块策略的过滤区确定方案．实验结果表明，新算法与改进前相比有效地加快了匹配速度，尤其在误差率较小时改进效果更佳．

英文摘要：

Approximate string matching is a basic problem in computer science. It is widely used in various areas. The aim of this study is to improve the speed of approximate string matching. Filter algorithm for approximate string matching is discussed because it is suitable for large-scale text searching. A novel filter algorithm based on match-region features is presented. Firstly, a q-gram index is used to preproeess text. Secondly, both pattern and text are logically divided into blocks of fixed size kq＋1, and then new match-region features are extracted from blocks, and the algorithm optimizes the fundamental q-gram filtration criterion by the new features. Finally, the improved method of choosing filtration-region based on QUASAR＇s block addressing is used for filtration. The experimental results demonstrate that the algorithm achieves higher matching speed than that of QUASAR＇s block addressing by way of improving filtration efficiency. In particular, its matching speed is much faster under low error rate. Experiments also reveal the relationship between matching speed and error rate of new algorithm. These results suggest that the algorithm is useful in a system for approximate string matching with low error rate. It is also powerful for long pattern approximate string matching on the condition of fixed edit distance k.

同期刊论文项目