该文介绍基于声学统计建模的语音合成技术,重点回顾中国科学技术大学讯飞语音实验室在语音合成领域这一前沿发展方向的创新性工作成果。具体包括:融合发音动作参数与声学参数,提高声学参数生成的灵活性;以最小生成误差准则取代最大似然准则,提高合成语音的音质;使用单元挑选与波形拼接方法取代参数合成器重构,改善参数语音合成器在合成语音音质上的不足。以上技术创新使得语音合成系统在自然度、表现力、灵活性及多语种应用等方面的性能都有进一步的提升,并推动语音合成技术在呼叫中心信息服务、移动嵌入式设备人机语音交互、智能语音教学等领域的广泛引用。
This paper introduces acoustic statistical modeling based speech synthesis technologies.Emphasis is on the research progress contributed by USTC iFLYTEK speech laboratory,which includes: integrate articulatory features and acoustical features for improving the flexibility of acoustical parameters generation;propose a minimum generation error criterion to replace maximum likelihood for improving the synthesized speech quality;use unit selection and waveform concatenation to replace parametric synthesizer and avoid the limitation of speech quality in HMM based parametric synthesis.These innovative techniques improve the performance of speech synthesis systems in naturalness,expressiveness,flexibility and multilingual ability etc.These progresses have made speech synthesis technologies to be widely used in fields of information service of call center,human-machine speech interaction of mobile embedded devices and intelligent speech enabled electronic education systems.