针对网络视频的监管需求,提出了一种基于音频词袋的暴力视频分类方法.采用提取视频中音频流的多媒体内容描述接口(MPEG-7)音频特征(包括音频频谱质心,音频频谱带宽等低层音频特征.)及MPEG-7高层特征——音频签名,来构造每段视频特有的音频词汇,采用该音频词汇出现的频率形成音频词袋特征.采用支持向量机对暴力和非暴力视频进行分类.把词袋模型应用到暴力音频特征分类中,对于不同音频词汇量采用了独特的词汇权重分配机制,同时借助特有的针对暴力视频的分类策略,以提高分类效果.通过3组实验,对不同的音频特征的准确率、不同词汇的分类效果、以及对视觉特征粗分类的精确分类进行了研究.实验结果表明,该方法有较好的查全率.
A new method to classify the violent videos by the bag of audio words was introduced.The MPEG-7 audio descriptors are firstly extracted,including the low level features such as AudioSpectrumCentroid and AudioSpectrumSpread etc.After that,the audio words are built through the MPEG-7 high level descriptor,the AudioSighnature,which is considered as the fingerprint of the audio stream.The support vector machine is used to classify the feature vectors into two genres,which are the violent and non-violent.There are three experiments in this paper: the research on the different types of the audio words,the different size of words and the classification of the shots detected from the visual features.It is demonstrated from the experiment result that the proposed method achieves good recall accuracy.