视频分类在视频检索、内容分析等应用领域具有十分重要的意义。多模态视频特征,如音频、静态图像及视频动作特征等都已经应用于视频分类中,因此如何对多种视频特征进行最佳组合来改善视频分类的性能成为了一个重要研究课题。提出一种基于L1正则化的距离学习方法,对利用多种特征组合提高视频语义标注性能的问题进行研究。由于引入一阶范数正则项,使得模型拥有选取多种视频特征进行最优组合的能力。该方法在通用的Columbia Consumer Video(CCV)视频数据集上显著提高了视频分类的性能。
Video classification plays a significant role in video retrieval and content analysis. Multi-modal video features like audio, static image features and video motion features, etc. have been widely used in video classification, therefore how the multiple video features could be optimally combined to improve the video classification performance has become an important research topic. In this paper we propose an L1 regularised distance learning model to study the subject of improving video semantic annotation performance with multiple features combination. The model gains the capability of an optimal combination of the selected muhiple viteo features due to the L1 norm regularisation term is introduced. Experiments show that our approach substantially improves the performance of video classification on universal Columbia Consumer Video (CCV) video dataset.