针对文件级单布鲁姆过滤器排重算法只能以文件为单位进行数据排重,数据块级单布鲁姆过滤器排重算法耗时过多的缺点,采用2个布鲁姆过滤器,创建文件级和数据块级2级数据排重的算法结构。实验结果表明,双布鲁姆过滤器排重算法可以以数据块为单位对数据排重,在保持低假阳性误判率的同时,相比数据块级单布鲁姆过滤器排重算法耗时缩短了43%-68%。
Aiming at the disadvantage of file level single bloom filter duplicate data delete algorithm deletes duplicate data only at file size, block level single bloom filter duplicate data delete algorithm’s time-consuming is too much. In this paper, it uses 2 bloom filter, creates a 2 level duplicate data delete algorithm structure-file level and block level. The experimental results show that, double bloom filter duplicate data delete algorithm could delete duplicate data at block level, keep false positive error rate at a low level, time-consuming gets 43%-68%shorter compared with block level single bloom filter duplicate data delete algorithm.