为了自动挖掘新闻字幕中人名与新闻人脸图像之间的一一对应关系,提出基于多类SVM的新闻政要人物自动标识方法。首先,针对每个人名,找到相应的图像集;其次,将人脸检测算法应用于图像集,检测出所有的人脸图像。为了减少计算量及提高聚类的准确率,将人脸图像分成两组图像。对第一组人脸图像进行聚类,聚类中最大类的人脸图像作为该人名的初始训练样本,对于其他人名,使用相似的方法找到初始训练样本;为了改善训练样本的可靠性,通过迭代更新挑选训练样本并训练多类SVM。最后,将多类SVM用于分类第二组人脸图像,实现新闻政要人物的自动标识。在大约50万幅的雅虎新闻图像数据集上进行实验的结果表明。该方法有效地提高了现有方法的性能。
In order to automaticly match the relationship between the names with the faces in news, we propose a novel method of automatic identification celebrity in news images based on multi-class SVM. For a given name, image collections are selected in accordance with captions where name appears. Then, to automatically obtain training positive samples for learn classifier, image collections are divided into two groups of face images. AP clustering is performed on the first group of face images to identify positive samples as initial samples of training SVM, and training positive sample are selected by iteratively training multi-class SVM~ The same method is used to other given names Finally, a multi-class SVM is applied to classify the second group face images. The experiments are conducted on news dataset which consist of thousands of news images with associated captions collected from Yahoo news. Compared with existing methods, the proposed method yields much better performance.