高维向量检索在模式识别、计算机视觉、信息检索等领域有着重要的作用。对数据点进行随机映射的位置敏感哈希是当前该问题的主要解决方法,它虽然速度快,但随机性强。为减弱其随机性,提出了多表投票的弱随机检索方法。该方法首先对所有数据点进行随机映射,然后进行相似计算得出检索向量,再将多个哈希表对应的检索向量构造成矩阵,最后对该矩阵列元素进行频次投票得出最终索引。实验说明该方法能综合利用多个哈希表的信息降低位置敏感哈希的随机性,并得出与真实近似程度相当的结果。
High dimension retrieval is important for pattern recognition, computer vision and information retrieval. As the mainstream solution to this high dimension fast retrieval problem, locality sensitive hashing is based on random projection of data points. This solution is fast, but suffers from strong randomness. To decrease the randomness, this paper presents a weak random retrieval method based on multi-hashing tables voting. This method projects all points randomly, acquires the retrieval vector according to similarity measurement, and then constructs a matrix based on retrieval vectors derived from multiashing tables. Frequency voting for column elements of the matrix is finally performed to obtain the final index. Experiments show that this method can comprehensively utilize information from multi-hashing tables to reduce randomness and produce results similar to that in the real world.