在构建藏语语料库时要对语音进行音素切分, 采用了两种方法, 即基于单音素HMM模型的自动切分方法和基于三音素HMM模型的自动切分方法。通过实验分析了这两种HMM模型的自动切分结果的准确率程度, 其中单音素、三音素总的平均切分准确度分别为80. 69%、88. 74%。实验结果表明, 三音素HMM模型的自动切分方法的准确率明显高于单音素HMM模型的切分率, 提高了语音语料库标注信息的精确度和一致性。
This paper introduced two methods for phoneme segmentation in Tibetan speech synthesis corpus construction: one was the automatic segmentation method which was based on the mono prime HMM model, the other was the automatic segmentation method which was based on the triphone HMM model. As the analysis to the accuracy of the two HMM automatic segmentation results, it shows that the first method's accuracy is 80. 69% and the second method's is 88. 74%. The experimental results show that segmentation method of the triphone HMM model accuracy is obviously higher than the other. With this method, the accuracy and consistency of the speech corpus has been greatly improved.