正音反馈的计算机辅助对外汉语发音训练系统已有发音偏误趋势的标注体系和基于HMM的偏误趋势检测系统。为了进一步提高系统的性能,该文应用深度神经网络进行声学建模,比较Mel频率倒谱系数(Mel-frequency cepstral coefficient,MFCC)、感知线性预测分析系数(perceptual linear predictive analysis,PLP)和Mel滤波器组系数(Mel filter bank,FBank)3种声学特征参数,并利用网格联合技术整合3种声学特征所得的候选网格。实验结果表明:DNN-HMM模型比GMM-HMM实现了更高检测正确率。针对不同发音偏误趋势,3种声学特征有不同表现,联合系统取得最高性能,最终性能为:错误拒绝率5.5%,错误接受率35.6%,检测正确率88.6%。
A previous computer aided pronunciation training (CAPT) system with instructive feedback used mispronunciation tendency labeling in a GMM HMM based detection system. This system is improved here using a DNN-HMM to model the mispronunciation with comparisons of the effects of three kinds of acoustic features, tbe reel-frequency cepstral coefficient (MFCC), the perceptual linear predictive analysis (PLP) and the Mel filter bank (FBank). The lattice rescore method is also used with these three features. The results show that the DNN-HMM gives a better detection rate than the conventional approach based on the GMM-HMM. Different features behave differently in capturing the specific mispronunciation tendencies, so the integration of these three features based on the lattice rescore gives the best results with an FRR of 5.5%, FAR of 35.6%, and DA of 88.6%.