文中提出一种基于用户关注空间与注意力分析的视频内容理解方法,该方法可以有效地获得多通道的视频关注信息,并可使用户根据个性化需求定制视频关注内容,实现视频的高效浏览与访问。首先采用基于二叉层次型结构与分类器选择的音频分类算法将视频中的主要声音类型分类,然后将视频中影响用户注意力的视觉、听觉、时序因素定义为用户关注空间,分别使用相应的中层特征在这三个方面对用户注意力进行表示并计算其关注度,从而在音视频底层特征与高层认知之间建立有机过渡。作者设计了顺序决策融合算法来融合视觉与听觉关注度,生成关注度时序变化曲线并获得精彩摘要。最后使用支持向量回归模型并引入相关反馈机制来实现用户个性化的精彩片段排序。该项工作的特点是通过建立符合人类认知规律的关注度模型并结合相关反馈技术,对视频内容进行类人理解。实验证明,该方法对提取与生成符合用户个性化要求的视频摘要及排序结果具有良好的效果。
This paper proposes a user attention analysis based video content understanding approach, which can be used to automatically detect the highlights of videos and rank them according to their impressive values. Firstly, audio classification is done using the authorsr hierarchical bintree framework and classifier selection algorithm. Then, the user attention space is established and the visual, aural, temporal mid-level features are extracted to represent the three main mo- dalities of this space, and the attention values are calculated correspondingly. A specific fusion strategy called ordinal-decision is used to combine the visual, aural attention models and form the attention curve for a video. The highlight segments can be extracted from this attention curve. Finally, the support vector regression model and relevance feedback mechanism are employed to rank the highlight segments and make the ranking result more suitable for human personalization. The method that introduces the user attention into the video content analysis field could effectively generate the summaries and rank them according to their impressive values. The proposed approach is based on the changes of human attention while watching videos rather than the simple content changes of them, which is more consistent with human understanding. Experimental results demonstrate that the proposed approach is effective for video summarization and highlight ranking.