针对以往算法在学习哈希函数过程引入随机性,导致得到的哈希码存在很大的差异性。该文以集成学习理论和并行计算方法为支撑,提出了一种非监督集成哈希学习(UEH)算法。首先,对于一些经典哈希算法,例如 SKLSH 和 ITQ,并不能获得唯一的汉明排序,因为在不同时刻学习得到的哈希函数并不唯一,亦即存在差异性;然后,运用集成学习算法去平衡哈希码之间的差异性,达到减少量化误差的效果;特别地,当基学习器满足高精度和较大差异性条件时,集成性能越高,因此,我们采用自举法,通过随机产生多组训练子集来增大差异性,从而进一步提高算法的泛化能力。在图像库CIFAR-10和 MINIST 上,运用该文算法进行图像检索,实验结果表明,该文算法的性能高于其他相关算法6%~15%。
The diversity existed in most recently hashing methods leads to the binary codes cannot efficiently preserve the data similarity.This paper,taking the ensemble learning theory and the parallel algorithm as a support,proposes a novel hashing method,i.e.Unsupervised Ensemble Hashing Learning (UEH ). Firstly,the ensemble method is utilized to balance the diversity so as to reduce the quantization error.Spe-cially,the higher accuracy and the larger diversity the base learner has,the more effective the ensemble method is.Then the bootstrap aggregating (bagging)method is used to increase the diversity.Finally,the paper uses iterative quantization to guarantee equivalent information of each hashing bits to effectively en-hance the generalization ability.The paper validates the method on two large scale datasets CIFAR-10 and MINIST for image retrieval,and the experimental results show that the performance gains of the pro-posed method is improved by 6%~1 5% compared with the state-of-the-art methods.In addition,an im-portant benefit of bagging scheme for hashing is inherently favorable to parallel computing.