目的哈希是大规模图像检索的有效方法。为提高检索精度,哈希码应保留语义信息。图像之间越相似,其哈希码也应越接近。现有方法首先提取描述图像整体的特征,然后生成哈希码。这种方法不能精确地描述图像包含的多个目标,限制了多标签图像检索的精度。为此提出一种基于卷积神经网络和目标提取的哈希生成方法。方法首先提取图像中可能包含目标的一系列区域,然后用深度卷积神经网络提取每个区域的特征并进行融合,通过生成一组特征来刻画图像中的每个目标,最后再产生整幅图像的哈希码。采用Triplet Loss的训练方法,使得哈希码尽可能保留语义信息。结果在VOC2012、Flickr25K和NUSWIDE数据集上进行多标签图像检索。在ND—CG(normalized discounted cumulative gain)性能指标上,当返回图像数量为1000时,对于VOC2012,本文方法相对于DSRH(deep semantic ranking hashing)方法提高2~4个百分点,相对于ITQ—CCA(iterative quantization-canonical correlation analysis)方法能提高3~6个百分点;对于Flickr25,本文方法比DSRH方法能提高2个左右的百分点;对于NUSWIDE,本文方法相对于DSRH方法能提高4个左右的百分点。对于平均检索准确度,本文方法在NUSWIDE和Flickr25上能提高2~5个百分点。根据多项评价指标可以看出,本文方法能以更细粒度来精确地描述图像,显著提高了多标签图像检索的性能。结论本文新的特征学习模型,对图像进行细粒度特征编码是一种可行的方法,能够有效提高数据集的检索性能。
Objective Hashing is an effective means for large-scale image retrieval. Preserving the semantic similarity in hash codes (i. e. , the distance between the hash codes of two images) should be small when the images are similar to improve the retrieval performance. Conventional methods first extract the overall image feature and then generate a single hash code. Such methods cannot characterize the image content for multiple objects, which results in a low accuracy of multi-label image retrieval. This study proposes a new hash generation method with object proposals. Method We propose a new deep-network-based framework to construct hash functions that learn directly from images that contain multiple labels. The model first derives a series of interesting regions that may contain objects and then generates the features of each region through deep convolutional neural networks. It finally generates a group of hash codes to describe all the objects in an iraage. The compact hash code will be generated to represent the entire image. A novel triplet-loss based training method is adopted to preserve the semantic order of the hash codes. Result The image retrieval experiments on the VOC2012, Fliekr25K, and NUSWIDE datasets show that the NDCG ( normalized discounted ct,mulative gain) value nf our method can be improved by 2% to 4% unlike DSRH (deep semantic ranking hashing) and 3% to 6% unlike ITQ-CCA ( iterative quantization-eanonical correlation analysis) on VOC2012. Our method can attain the imprnvements by approximately 2% on Fliekr25 and 4% on NUSWIDE. Our method can obtain 2% to 5% on Ihe Flickr25 and NUSWIDE datasets over Ihe DSRH for the map evaluation. Thus, the new method can describe an image accurately in a fine-grained way, and the performam'e is improved significantly for multi-label image retrieval. Conclusion This study proposes a new model to learn compact features, and experiment results show that the fine-grained feature embedding of an image is practicable. Thus, our method outperforms