探讨了如何有效地利用互联网上大规模的图像和文本信息以数据驱动的方式来实现图像的自动标注,并提出了一种基于语义相关区域搜索的图像自动标注框架.该框架首先利用人工建立的视觉和文本知识库Image—Net来训练一组弱分类器;然后将学习好的弱分类器作用于分割后的图像区域块生成Region-level的语义特征表示用以在大规模的图像数据库中进行相关图像区域的搜索,最后从搜索结果的文本描述中通过聚类挖掘的方式产生最终的图像标注结果.对比于image—level的底层特征表示,基于分类学习的区域模块具有更强的语义表达能力和更好的鲁棒性,更容易抓住图像本身包含的多个目标的多重语义;从而使得该框架兼具了大规模数据驱动和传统基于分类算法的优点.大量web图像和公认的测试数据集上进行的实验结果证明了本文提出框架的有效性.
Based on abundant partially annotated images on the web, a novel framework for image annotation was proposed. By utilizing both the visual and textual knowledge of public available image database Image-Net, the proposed framework first learnt a set of weakly labeled visual concept classifiers, and then used the outputs of these learnt classifiers on image regions as descriptors to conduct the region-based search in a large scale image database for a query image. After that, search results mining and clustering was introduced to generate annotations to the query image. Compared with image-level representation, the proposed region-based semantic representation performs better at capturing image's multi-objects/semantics. The proposed framework takes advantage of both traditional classification-based approaches and large scale data-driven approaches. Experimental results conducted on 2.4 million web images and challenging image database have demonstrated the effectiveness and efficiency of the proposed approach.