为了改善语音转换的性能,对基音频率转换方法进行了研究,并提出了一种有效的转换算法.首先,不同于传统的线性变换方法,对基音频率和频谱特征的内在关系进行了分析,在GMM中的每一分量,基音频率通过SVR方法从转换后的频谱特征预测得到.然后,为了缓解GMM统计平均带来的过平滑问题,将传统的均值-方差转换方法和SVR方法相结合.同时,引入广泛应用于图像处理的自适应中值滤波来解决由基于帧转换引起的不连续问题.通过主客观评价方法对转换后的语音质量进行了测试,结果表明:该方法无论在语音的相似度还是转换语音的质量上,都取得了比传统方法更好的效果.
In order to improve the performance of voice conversion, the fundamental frequency (F0) transformation methods are investigated, and an efficient F0 transformation algorithm is proposed. First, unlike the traditional linear transformation methods, the relationships between F0s and spectral parameters are explored. In each component of the Gaussian mixture model (GMM), the F0s are predicted from the converted spectral parameters using the support vector regression (SVR) method. Then, in order to reduce the over- smoothing caused by the statistical average of the GMM, a mixed transformation method combining SVR with the traditional mean-variance linear (MVL) conversion is presented. Meanwhile, the adaptive median filter, prevalent in image processing, is adopted to solve the discontinuity problem caused by the frame-wise transformation. Objective and subjective experiments are carried out to evaluate the performance of the proposed method. The results demonstrate that the proposed method outperforms the traditional F0 transformation methods in terms of the similarity and the quality.