图像语义的自动标注是一个具有挑战性的研究课题,目前常见的机器学习方法,如统计生成模型(generative model)与判别模型(discriminative model)都被用于该问题的研究中.然而由于语义鸿沟的存在、图像训练数据的不平衡性以及图像标注的多标签特性等问题,使得上述方法的性能都有待进一步提高.提出一种基于可判别超平面树的生成模型图像标注方法.该方法根据待标注目标图像的高生成概率邻域建立局部超平面分类树,进而利用同层类间可判别信息,按自顶向下的层次分类得到待标注图像的语义相关图像集合.由此得到的相关类信息与新的生成模型框架相结合对待标注图像与语义关键词的联合概率进行估计,实现对目标图像的标注.其特点在于生成模型与判别模型方法得到了有效结合,可判别超平面树对隐含语义聚类的判别分析是对待标注图像的生成“邻域”的逐步求精过程,有效地提高了生成模型标注准确度;而对于判别分析难以解决的多标签分类、训练数据不平衡等问题,此方法通过联合概率估计自然地实现目标图像的多标签分配.在常用的包含5000幅图像的ECCV2002数据集进行了实验,结果表明,与目前已知的具有较好标注效果的基于生成模型的MBRM模型(采用图像分割方法)以及基于辨别分析的ASVM-MIL相比,此方法的,F1因子分别提高了14%和13%.
Many machine learning methods such as generative model and discriminative model have been applied to image semantic automatic image annotation. However, due to the "semantic gap", the imbalanced training data, and the multi-label characteristic of image annotation, the annotation performance still calls for improvement. In this paper, an image annotation method is proposed which augments the classical generative model with the proposed discriminative hyperplane tree. Based on the high visual generative probability training images (neighborhood) of the unlabeled image,' the local hyperplane classification tree is adaptively established. The semantic relevant training image set is obtained through top-down hierarchical classification procedure by exploiting the discriminative information at each level. The joint probability between the unlabeled image and the semantic words is estimated based on the obtained semantic relevant local training set under the proposed framework. This method combines the advantages of both generative model and the discriminative models. From the aspect of generative model: by exploiting the discriminative information of the semantic cluster in the discriminative hyperplane tree, a local generative set is progressively refined, and therefore, improves accuracy. From the aspect of discriminative model: the multiple label assignment can be naturally implement by estimating the joint probability, which reduces the limitation of discriminative model induced by the imbalanced and overlapping training set. The experiments on the ECCV2002 benchmark show that the method outperforms state-of-the-art generative model-based annotation method MBRM and discriminative model based ASVM-MIL with F1 measure improving by 14% and 13% respectively.