针对发音错误检测中标注的发音数据资源有限的情况,提出在Tandem系统框架下利用其他数据来提高特征的区分性.以中国人的英语发音为研究对象,选取了相对容易获取的无校正发音数据、母语普通话和母语英语作为辅助数据,实验结果表明,这几种数据都能够有效地提高系统性能,其中无校正数据表现出最好的性能.同时,比较了不同的扩展帧长,以多层神经感知(MLP)和深度神经网络(DNN)作为典型的浅层和深层神经网络,以及Tandem特征的不同结构对系统性能的影响.最后,多数据流融合的策略用于进一步提高系统性能,基于DNN的无校正发音数据流和母语英语数据流合并的Tandem特征取得了最好的性能,与基线系统相比,识别正确率提高了7.96%,错误类型诊断正确率提高了14.71%.
To deal with the under-resourced labeled pronunciation data in mispronunciation detection, some other data were used to improve the discriminability of feature in the framework of Tandem system. Taking Chinese learning of English as object, unlabeled data, native Mandarin data and native English data which can be relatively easily accessed were selected as the assisted data. The experiments show that these types of data can effectively improve the performance of system, and the unlabeled data performs the best. And the effect to system performance was discussed with different length of frame context, the shallow and deep neural network typically represented by Multi-Layer Perception (MLP) and Deep Neural Network (DNN), and different structure of Tandem feature. Finally the strategy of merging multiple data streams was used to further improve the system performance, and the best system performance was achieved by combining the DNN based unlabeled data stream and native English stream. Compared with the baseline system, the recognition accuracy is increased by 7.96%, and the diagnostic accuracy of mispronunciation type is increased by 14.71%.