为解决单纯依赖图像低级视觉模态信息进行图像识别准率低的问题.考虑到许多图像中存在文本信息,提出了利用图像中的文本信息辅助图像识别的语义级文本协同图像识别方法.该方法通过文本定位方法定位到图像中的文本块,对其进行分割、二值化、提取特征等处理;然后获取语义,提取图像底层视觉信息,计算两模态的相关性,从而得到协同后验概率;最后,得到联合后验概率,并取其中最大联合后验概率对图像进行识别.在自建体育视频帧数据库中,通过与以朴素贝叶斯为代表的单模态方法进行比较,方法在3种不同视觉特征下均具有更高的准确率.实验结果表明,文本协同方法能够有效辅助图像识别,具有更好的识别性能.
To solve the problem that singular-modal image recognition using only the low-level visual features has low accuracy, considering that many images have embedded-in textual information, a collaborative method using the embedded-in text to aid the recognition of images is proposed. The method includes three steps. Firstly, after localization, segmentation, binarization and feature extraction, semantics of text is gotten. Secondly, the collaborative posterior probability is calculated by extracting visual features of images and counting correlation of visual and textual modals. At last, for each class of images, the joint posterior probability is calculated using the previous two items. A new image is recognized to the class with maximal joint posterior probability. Experiments on the self-built data set of sports video frames showed that the proposed method performed better than the singular-modal method on three different visual features and had higher accuracy.