标准Bloom Filters在操作前需要知道数据集合中不同元素数目才能确定最佳的Hash函数数目,但是数据集的分布情况并不容易事先获得.本文提出一种多阶段Hash函数数目动态优化的Bloom Filters(Multi-stage Dynamicoptimization Bloom Filters,MDBF),它将元素插入过程分为多个阶段,在每个阶段根据比特向量的使用情况分析插入元素的分布,动态调整最优的Hash函数数目.实验表明MDBF能够适应元素多样性和偏斜分布的复杂情况,选择最优的Hash函数数目,获得更低的误检率.
Standard Bloom Filters needs to know the number of different elements in data set in order to determine the optimal number of hash functions.However,the data distribution information is not easy to obtain prior.This paper proposes a multistage dynamic optimization for Bloom Filters hash functions number(MDBF).It splits element insertion procedure into several stages,and in each stage of element insertion,MDBF decides the optimal hash function number by analyzing the inserted data distribution with bit vector usage situation.The experimental results show that MDBF can select the optimal number of hash functions to obtain low false positive probability in complicated applications,which have element multiplicity and skewed distribution.