针对数据流的动态特性,提出了一种基于移动指针的数据流冗余消除算法——SKIP Bloom filter,其核心思想是通过动态指针和双Bloomfilter来区分历史数据映射与当前数据映射,从而有效提升了算法的性能和准确度。理论证明,它具有O(n)的时间复杂度与O(1—1/(2m))w.k)k的假阳性误判率。实验结果表明,算法在实际网络环境中与已有算法相比,准确度提高了2~12倍。
According to the dynamic characteristics of data streams, a duplicate elimination algorithm was proposed with low time complexity and high accuracy based on SKIP Bloom filter. A moving cursor and double Bloom filter were used to differentiate history data and current data mapping. Theoretically, it proves that the algorithm has the time complexity of O(n) and the false positive rate of O (1- (1-1/(2m))w.k)k . The experiment shows that the new SKIP Bloom filter im- proves the accuracy of 2-12 times in real networks compared with other existing algorithm.