互联网上快速增长的多媒体信息往往包含几种不同的模态,并且在同一个多媒体文档中的这些不同形式的模态往往包含相似的含义.因此,最近多模态检索已经变成了多媒体检索领域的热点问题.提出一个基于概率潜语义分析的多模态检索模型用来完成多模态的检索.两个假设被提出:(1)同一个多媒体文档的不同模态是这个文档的多种表达方式,因此它们都表示相似的含义;(2)文本单词和图像特性是独立地被生成出来的.利用概率潜语义分析分别模拟训练集中文本和图像的生成过程并且通过期望最大化算法学习获得它们的潜在主题分布.利用多元线性回归方法分析文本表达和图像表达,并利用最小二乘法得到回归矩阵的估计.这个矩阵用于将文本和图像模态互相转换.实验表明了该方法的有效性.
Nowadays,multimedia information that has explosively increased in the Internet usually consists of a variety of different modal contents and these multi-modal contents probably represent the similar senses. Thus recently the multimodal retrieval becomes the hotspot in the multimedia retrieval research. In this paper, we propose a multimodal multimedia retrieval modal based on probabilistic Latent Semantic analysis ( pLSA ) to achieve multi-modal retrieval. Two hypotheses are presented that ( 1 ) the different modal contents ( the text and image ) in one document are the representations of the different forms of this document so they represent the similar senses, and ( 2 ) the textual words and the visual features are respectively generated independently. We employ the generative model, pL- SA, to respectively simulate the generative processes of texts and images in the same documents in the training set and the topics of pLSA model are learned by EM method. Then we employ the multivariate linear regression method to analyze the correlation between representations of texts and images and use the ordinary least squares (OLS ) method to obtain the estimation of the regression matrix that can be used to transform between textual and visual modal data. Extensive experiments results demonstrate the effectiveness and efficiency of the proposed model.