人体行为识别在视频监控、医疗诊断等领域都有重要的意义。目前人体识别的主要方法是将人为设计的二维特征扩展到三维空间,或利用运动轨迹,提取出时空特征。基于深度学习的思想,直接在三维空间中构建多层神经网络,从大量的视频数据中学习不同行为的时空特征。首先,采用独立子空间分析(independent subspace analysis,ISA)方法,构造两层卷积叠加神经网络,从训练视频中学习网络权重。然后,对特征使用K-means聚类,转化为视觉单词,根据视觉单词频率直方图计算支持向量机模型(support vector machine,SVM)判决超平面,最后对待分析视频进行动作分类。使用该方法对Hollywood2数据库的12种行为进行实验,结果表明,ISA学习到的特征权重与Gabor滤波器类似,对图像频率和方向具有明显的选择性,对相位变化具有鲁棒性,能够显著提高认为识别的正确率,符合人眼的视觉特征。
Human action recognition plays an important role in the field such as video supervision and medical diagnosis.Current methods are based on the expansion from two-dimension artificial design features to three-dimensions,ones or extracting spatio-temporal features via trajectories.Based on deep learning methods,this paper proposes a multilayer neural network in three-dimensional space,learning rich spatio-temporal features from large amount of videos.First,we use independent subspace analysis to build a two layer stacked convolutional neural network,obtaining weights from training database.Spatio-temporal features are then quantized into visual words with K-means clustering.Non-linear support vector machine(SVM)were used to classify frequency histograms of visual words into different action groups.We apply our algorithm to Hollywood2 database,extracting spatio-temporal features from 12 human action groups.Result shows that the feature weights trained by ISA network are similar with those by Gabor filter,which have obvious selectivity of frequency and direction,robustness to phase variation,conforming to the human visual system.