介绍基于声学统计建模的新一代语音合成技术.重点介绍中国科学技术大学讯飞语音实验室在发展新一代语音合成技术中的贡献,包括:融合发音器官参数与声学参数,提高声学参数生成的灵活性;以最小生成误差准则取代最大似然准则,提高合成语音的音质;使用单元挑选与波形拼接方法取代参数合成器重构,从根本上改善HMM参数语音合成器在合成语音音质上的不足.这些技术创新使得新一代语音合成在自然度、表现力、灵活性及多语种实现等方面的性能都有进一步的提升.
This paper introduces acoustic statistical modeling based new generation speech synthesis technology. Emphasis is laid on the research progress in the field of new generation speech synthesis technology contributed by USTC iFlytek speech laboratory, which includes integration articulatory and acoustic features for improving the flexibility of acoustic parameter generation; a minimum generation error (MGE) criterion proposed to replace maximum likelihood for improving synthesized speech quality; use of unit selection and waveform concatenation to replace parametric synthesizer, thus effectively avoiding the limitation of speech quality in HMM based parametric synthesis. These technical innovations may further improve the performance of new generation speech synthesis technology in naturalness, expressiveness, flexibility and multilingual realization, etc.