暴力镜头检测是近年来的研究热点之一.早期的暴力镜头检测主要依赖视频特征,由于音频信息具有良好的稳定性和在不同文化和人群之间的一致性,现在人们越来越多地关注音频信息的使用.为此研究使用音频特征对电影镜头中的暴力音频事件进行检测.为此提出了一种基于多尺度时长的特征提取方法.提取了除MFCC、LPC、能量等短时特征以外,还提取了能量均值方差、子带能量均值和方差、帧间差分等长时特征.暴力镜头中出现较多且具有代表性的音频事件有爆炸、尖叫、枪击三种.本文以电影的镜头为识别单位,使用支持向量机分类算法实现了一个检测系统.通过在15部好莱坞电影上的实验,表明本文基于多尺度时长的音频特征在暴力音频事件检测工作中,能够取得较好的结果.
Violence detection is one of the hot research topic in recent years. Early work mainly depends on the video characteristic, considering the audio information has good stability and consistency between different cultures and people, people have paid more and more attention to the use of audio information. This paper studies using audio features to detect violent audio event in the movie. So this paper presents a multi - scale feature extraction method. Besides MFCC, LPC, short term energy, the paper also extracted the long term feature, such as the mean and variance of energy and sub - band ener- gy, difference between frames. The audio events appeared frequently in violence scenes are explosions, screams, gunshots. Therefore, using support vector machine classification algorithm, the paper implements a detection system, to detect the vi- olent audio event in the movie scenes. Through experiments on 15 Hollywood movies, experiments results show that the multi - scale audio features can achieve good results in the violent audio event detection work.