针对传统基于内容的多媒体检索对单一模态的限制,提出一种新的跨媒体检索方法.分析了不同模态的内容特征之间在统计意义上的典型相关性,并通过子空间映射解决了特征向量的异构性问题,同时结合相关反馈中的先验知识,修正不同模态多媒体数据集在子空间中的拓扑结构,实现跨媒体相关性的准确度量.实验以图像和音频数据为例验证了基于相关性学习的跨媒体检索方法的有效性.
Most traditional content-based multimedia retrieval methods are designed for multimedia data of single modality. Such methods include image retrieval, audio retrieval, video retrieval, etc. This paper proposes a novel cross-media retrieval approach, which can process multimedia data of different modalities and measure cross-media similarity, such as image-audio similarity. First statistical method is used to learn canonical correlations between low-level feature spaces of different modalities. Then, sub-space mapping is designed to build an isomorphic subspace and solve the heterogeneity problem between different low-level feature vectors. This subspace contains media objects of different modalities, and each media object is represented with isomorphic vector. Since canonical correlations among multimedia objects are furthest preserved during the mapping process. cross-media similarity can be estimated with defined distance metric. Furthermore, relevance feedback provided by users is utilized to learn prior knowledge and refine multimedia topology in the subspace. In this way cross-media similarity is more consistent with human perception with the incorporation of user interaction. Both image and audio data are selected for experiments and comparisons. Given the same visual and auditory features the new approach outperforms ICA, PCA and PLS methods both in precision and recall performance. Overall crossmedia retrieval results between images and audios are very encouraging.