为有效减少模型训练参数,降低维吾尔语语音识别词错误率,提出了基于卷积神经网络的维吾尔语语音识别。该方法将局部连接、权值共享以及池化有机结合,极大减少了模型训练参数。同时结合maxout和dropout算法,克服模型训练中数据稀疏的问题,进一步提高识别率。THUYG-20维吾尔语语音数据库的实验结果表明,相比传统的基于高斯混合模型隐马尔可夫模型的语音识别系统和基于深度神经网络的语音识别系统,基于卷积神经网络语音识别系统使维吾尔语语音识别错误率分别降低了15.97%和2.55%。
To improve the speech recognition rate and reduce the number of parameters, the method for speech recognition based on convolutional neural network is proposed, which reduces the number of parameters by using locality connection, weight sharing and pooling jointly. Moreover, the speech recognition rate is improved by using dropout or maxout method which can solve the data sparse problem effectively. Experimental results on THUYG-20 corpus tasks show that the word error rate in Uyghur speech recognition is reduced by 15.97% and 2.55% respectively using the convolutional neural network model compared to that using the HMM-GMM and the deep neural network.