布鲁姆过滤器(Bloom filter)对数据集合采用一个位串表示并能有效支持元素的哈希查找,是一种精简的信息表示方案,广泛应用于数据库、网络和分布式系统中.本文研究布鲁姆过滤器的序列分析方法,通过定义布鲁姆过滤器距离,用概率统计方法分析动态数据集合元素增加和删除的变化对布鲁姆过滤器的影响.提出了基于计数式布鲁姆过滤器距离的集合变动定量评估算法.理论分析和仿真实验表明,该评估算法评估准确率高达90%以上.
A Bloom filter is a simple space-efficient randomized data structure for representing a data set in order to support membership queries with allowable errors. Bloom filters hash each element of the set to a vector of indices using a group of hash functions, and can filter out effectively any element that does not belong to the set. It is widely used in databases, networks and distributed systems. This paper first focuses on Bloom filter vector series analysis. The Bloom filter distance is also defined. Based on statistics analysis of the dynamic change of data set to the bloom filter, a quantitative assessment algorithm for the dynamic change of data set based on bloom filter distance (QABFD) is presented. Experimental and theoretical results show that the QABFD has 99% evaluation accuracy rate.