为了实现新闻图像中重要人物的自动标志,针对由于不同的表情、光照、姿势等因素带来的视觉空间上的差异性问题,提出融合文本和视觉多模态信息的新闻人物自动标志方法.首先针对每个人名找到与该人名相关的人脸图像子集,建立人名与人脸的映射关系;其次在文本空间计算相似度,并在视觉空间对人脸子集图像进行聚类和计算相似度;最后采用加权的Borda方法对文本和视觉空间的相似度排序进行序融合.在大约50万幅的雅虎新闻图像数据集上进行实验的结果表明,该方法可显著地提高基于聚类方法的性能.
In order to automatically identify the important persons in news images and solve the diversity problem of visual distribution due to different factors such as expression, illumination and pose, we propose the method of automatic person identification based on multi-modal information fusion. First, for each target name, we find its corresponding face image subset and establish the mapping between the name and the faces. Second, we calculate the similarity in the text space as well as in the visual space, and cluster each face image subset. Finally, we exploit the weighted Borda method to fuse the similarity order of text space and visual space. The experiments are performed on the data set including approximately half a million news images from Yahoo! news, and the results show that the proposed method achieves significant improvement over the clustering-only methods.