针对作者已经提出的双因子高斯过程隐变量模型(Two—factor Gaussian process latent variable model,TF—GPLVM)用于语音转换时未考虑语音的动态特征,并且模型训练时需要估计的参数较多的问题,提出引入隐马尔科夫模型(HiddenMarkovmodel,HMM)对语音动态特征进行建模,并利用HMM隐状态对各帧语音进行关于语义内容的概率软分类,建立了分离精度更高、运算负荷较小的双因子高斯过程动态模型(Two-factor Gaussian process dynamicmodel,TF-GPDM).基于此模型,设计了一种全新的基于说话人特征替换的语音声道谱转换方案.主、客观实验结果表明,无论是与传统的统计映射和频率弯折转换方法相比,还是与双因子高斯过程隐变量模型方法相比,本文方法都获得了语音质量和转换相似度的提升,以及两项性能的更佳平衡.
We developed in a previous work a two-factor Gaussian process latent variable model (TF-GPLVM) to perform spectral conversion using a strategy of speaker characteristics replacement. Despite its improved performance compared with traditional mapping-based methods, the model suffers from two drawbacks: 1) it cannot capture the speech dynamical characteristics, and 2) there is a large number of parameters to estimate. To overcome these two drawbacks, we propose in this paper to combine TF-GPLVM with hidden Markov model (HMM), and develop an enhanced two-factor Gaussian process dynamic model (TF-GPDM). In the model, the speech dynamics are modeled by state transition probability of HMM, meanwhile speech frames are categorized into a limited number of phonetic content classes using HMM states. Both subjective and objective evaluations show that, compared with both traditional mapping-based methods, such as Gaussian mixture model (GMM) and FW, and TF-GPLVM based one, the proposed TF-GPDM not only improves the speech quality and identity similarity, but also reaches a better compromise between the two dimensions.