由于传统局部敏感散列(LSH)算法的删除性能不足,阻碍了LSH算法在实际产品中的应用.提出一种基于压缩位图的改进方法,通过引入压缩位图改良传统LSH算法的桶中数据结构,以及使用标记清除策略进行算法流程优化,解决传统LSH索引实时删除性能差的问题.理论分析证明:基于压缩位图的LSH(CB-LSH)算法可以显著降低算法的空间复杂度和时间复杂度.实验结果支撑了理论分析的结论,相对于传统LSH算法,CB-LSH在降低内存消耗的同时,可显著提高索引删除、数据插入和数据查询的性能.在大型项目中的应用实践验证了在线实时更新的海量多媒体数据检索系统中,CB-LSH索引算法对于多媒体数据的高维索引是有效可行的,并显著提升了性能、降低了资源消耗.
An improvement method based on compressed bitmap was proposed to address the deficiency of deletion performance in traditional locality-sensitive hashing(LSH) algorithms.To improve the overall performance especially the delete performance of LSH algorithms,a novel Boolean algebra schema for LSH operations was designed and the vector structure in hash-buckets was replaced with compressed bitmaps.In addition,a mark-and-sweep strategy was utilized to perform a batch deletion and further improved the delete performance.Theoretical analysis showed the improvements of CB-LSH in both memory space requirements and time complexity of all the LSH operations.Thorough experiments showed that CB-LSH not only saved notable memory space,but also achieved significant improvements in delete,query and insert performances.Industrial practice also verified that CB-LSH is qualified for online updates and queries for content-based image retrieval in large-scale multimedia databases,and CB-LSH significantly improves the overall performance and optimally utilizes the resources.