为提高连续语音识别中的音素识别准确率,采用深可信网络提取语音音素后验概率进行音素识别.首先利用受限玻尔兹曼机的学习原理,对深可信网络进行逐层的预训练;然后通过增加一个“软最大化(softmax)”输出层,得到用于音素状态后验概率检测的深层神经网络,并采用后向传播算法进行网络权值的精细调整;最后以后验概率为HMM发射概率,使用Viterbi解码器进行音素识别.针对TIMIT语料库的实验结果表明,该系统的音素识别率优于GMM/HMM,MLP/HMM和TANDEM系统性能.
To improve the performance of phoneme recognition in automatic speech recognition, a phoneme recognition method is built based on phoneme posteriors which are extracted by deep belief networks. Firstly, a deep belief network is pre-trained and layered as RBM greedily, and a deep neural network is created by adding a "softmax" output layer to the network. Subsequently, discriminative fine-tuning by back-propagation is done to adjust the weights and to make them better at predicting the probability distribution over the states of monophone hidden Markov models. Finally the sequence of the predicted probability distribution is fed into a standard Viterbi decoder. It is found that the method performs better on the TIMIT dataset than GMM/HMM, MLP/HMM and TANDEM methods.