为了提升多维元素成员查询的灵活性和准确率,提出了一种新型索引结构CBFM(cutted Bloom filter matrix)。该索引方法通过独立属性布鲁姆过滤器笛卡尔乘积构建位矩阵,支持任意属性组合的多维元素成员查询,同时支持属性组合按需删减和属性加权,极大地提升内存空间利用率,降低查询误判率。理论分析证明相比于BFM(Bloom filter matrix)索引方法,CBFM具有更高的内存利用率。仿真实验表明,在分配内存相同的情况下,CBFM方法相比于其他方法,具有最低的查询误判率,特别在内存受限场景下,CBFM相比于BFM方法,查询误判率最大降低3个数量级,极大地提升了多维元素成员查询的准确率。
In order to improve the flexibility and accuracy of multi-dimensional membership query, a new indexing structure called CBFM(cutted Bloom filter matrix) was proposed. CBFM built the bit matrix by the Cartesian product of different bloom filters, each representing one attribute of primary data. In this way, the proposed matrix supported by-attribute membership query. Besides, the attribute combinations in the bit matrix could be reduced and weighted on demand to further enhance memory utilization rate. Theoretical analysis proves that CBFM utilizes memory more efficiently than BFM, the current state of art. Experiments also show that, on the scenario of memory size fixed, the false positive rate of CBFM is lower than that of all other indexing methods. Especially on the scenario of memory constrained, the false positive rate of CBFM can be 3 orders of magnitude lower than that of BFM(Bloom filter matrix) indexing method. CBFM is an accurate data structure for multi-dimensional membership query.