传统2D卷积神经网络对于视频连续帧图像的特征提取容易丢失目标时间轴上的运动信息,导致识别准确度较低。为此,提出一种基于多列深度3D卷积神经网络(3D CNN)的手势识别方法。采用3D卷积核对连续帧图像进行卷积操作,提取目标的时间和空间特征捕捉运动信息。为避免因单组3D CNN特征提取不充分而导致的误分类,训练多组具有较强分类能力的3D CNN结构组成多列深度3D CNN,该结构通过对多组3D CNN的输出结果进行权衡,将权重最大的类别判定为最终的输出结果。实验结果表明,将多列深度3D CNN应用于CHGDs数据集上进行手势识别,识别率达到95.09%,与单组3D CNN及传统2D CNN相比分别提高近7%,20%,对连续图像目标识别具有较好的识别能力。
The feature extraction method adopted by traditional Convolutional Neural Network(CNN) for video image with continuous frames is east to lose movement information on the target time axis, resulting in low recognition accuracy. To solve this problem, a method based on multi-lolu deep 3D is proposed. The 3D convolution kernel is used to extract the temporal and spatial features to capture the object' s motion information. In order to avoid the error classification because of the insufficient feature information of single 3D CNN, the multi-column 3D CNN is consisted by multi-component 3D CNN that each of them has very strong classification ability. The output of this structure is weighed by the output of each of the 3D CNN, and the category which has the maximum weight is determined to be the final result. The structure of multi-column 3D CNNs is applied to the CHGD for hand gesture recognition. Experimental results show that the method achieves a recognition rate of 95.09% , and the recognition rate compared to a single 3D CNN increases by nearly 7% ,it increases by nearly 20% compared to the traditional 2D CNN,it has very excellent recognition ability for the video image sequence.