为了提高转换语音的可懂度与自然度,文中在语音信号的特征抽取方面,注重对语音信号韵律特性的研究,提出了一种多时间尺度的韵律特性抽取方法及其参数化表示,基于逐级细化的策略实现语音信号在多时间尺度下的韵律特征分析与提取,实现对韵律特性从整体到局部细致完整地刻画,克服了韵律信息表述的模糊性和复杂性。实验结果表明,文中提出的语音转换系统在四种测试类型中性能良好,与现有的高斯混合模型相比,ABX测试结果提高了10.88%,同时MOS得分平均提高了18.59%。
In order to improve the conversion speech intelligibility and natural degrees, based on speech signal feature extraction, pay great attention to the research of speech signal prosody characteristics, put forward a prosody characteristics extraction method based on multi- time scale and parameterized representation. Based on stepwise refinement strategy, achieve the implementation of prosodic feature extrac- tion on different time scales, which can enable detailed full description for prosodic information from global to local,overcome the ambi guity and complexity of prosody characterization. The experimental results show that the performance of proposed voice conversion sys tem in four test type is good,and compared with existing Gaussian mixture model,ABX test results increased by 10.88% ,and at the same time,MOS scoring average is improved by 18.59%.