在视频语义信息理解和挖掘中,充分利用图像、音频和文本等多模态媒质之间的交互关联是非常重要的研究方向.考虑到视频的多模态和时序关联共生特性,提出了一种基于多模态子空间相关性传递的语义概念检测方法来挖掘视频的语义信息.该方法对所提取视频镜头的多模态底层特征,根据共生数据嵌入(co-occurrence data embedding)和相似度融合(Si mFusion)进行多模态子空间相关性传递而得到镜头之间的相似度关系,接着通过局部不变投影(locality preserving projections)对原始数据进行降维以获得低维语义空间内的坐标,再利用标注信息训练分类模型,从而可对训练集外的测试数据进行语义概念检测,实现视频语义信息挖掘.实验表明该方法有较高的准确率.
Research on content-based multimedia retrieval is motivated by a growing amount of digital multimedia content in which video data is a big part. Interaction and integration of multi-modality media types such as visual, audio and textual data in video are the essence of video content analysis. Although any uni-modality type partially expresses limited semantics less or more, video semantics are fully manifested only by interaction and integration of any unimodal. Video data comprises plentiful semantics, such as people, scene, object, event and story, etc. A great deal of research has been focused on utilizing multi-modality features for better understanding of video semantics. Proposed in this paper is a new approach to detect semantic concepts in video using co-occurrence data embedding (CODE), SimFusion, and locality preserving projections (LPP) from temporal associated cooccurring multimodal media data in video. The authors' experiments show that by employing these key techniques, the performance of video semantic concept detection can be improved and better video semantics mining results can be obtained.