在基于内容的图像检索中,图像标记具有十分重要的作用.由于为图像加标注代价昂贵,研究者通过利用大量的未标记数据来提高分类性能,标记传播是其中的一类有效方法.随着数据采集、存储技术的发展,数字图像的积累越来越容易,但现有的标记传播方法难以处理真实世界中的大规模数据.因此,针对大规模图像标记,融合标记传播和随机森林技术,提出一种新方法RFLP.它使用随机决策树进行样本压缩,使得传统的标记传播方法能够在压缩过的示例上高效执行,以利用未标记数据提高分类性能,然后利用随机森林将标记传播的结果推广到所有未标记示例上.实验结果表明,新方法RFLP的可扩展性明显优于传统标记传播方法,且其分类性能良好.
Image annotation plays an important role in content-based image retrieval. Since annotating images is expensive, researchers have proposed many methods exploiting the large amount of unlabeled data to improve the performance of classifiers. Among those methods, label propagation has been proven to be effective in many applications. With the proliferation of digital photography, the amount of images is increasing at a very high speed, and however, existing label propagation approaches cannot tackle with real world large-scale problems because they need to construct graph structures of instances. In this paper, we propose a novel large-scale algorithm for image annotation, called RFLP, which combines the strengths of random forest and label propagation. The reason why to use random forest is that it shows good performance on scalability and generalization, and based on the locality of decision trees, the large-scale data can be compressed. At first, it reduces the largescale problem to small-scale by random decision trees. Then a traditional label propagation approach can propagate labels on the compressed data quite efficiently. And after that, it spreads the propagation results to all the unlabeled instances using random forest. Experimental results show that, compared with traditional label propagation methods, the proposed RFLP is effective and significantly cost-saving.