提出了深度神经网络DNN的多GPU并行框架,描述了其实现方法及其性能优化,依托多GPU的强大协同并行计算能力,结合数据并行特点,实现快速高效的深度神经网络训练.对语音识别应用,在模型收敛速度和模型性能上都取得了有效提升——相比单GPU有4.6倍加速比,数十亿样本的训练数天收敛,字错率降低约10%.
A Muhi-GPU parallel framework o{ DNN is given out, and the implementation method and its performance optimization are presented, relying on the powerful synergy parallel computing ability of Muhi-GPU, combining with the characteristics of data parallel, the fast and efficient training of Deep Neural Network is realized. The application of speech recognition, the model convergence rate and model performance have achieved effective promotion. Compared with single GPU, it has 4. 6 times speedup ratio, billions of training days convergence, and word error rate is reduced about 10%.