该文提出一种基于低秩约束的本征音子(Eigenphone)说话人自适应方法。原始的本征音子说话人自适应方法在自适应语料充分时具有很好的效果,然而当自适应语料不足时,出现严重的过拟合现象,导致自适应后的系统可能比自适应前的系统还要差。首先,对协方差矩阵为对角阵的隐马尔可夫-高斯混合模型语音识别系统,推导出一种简化的本征音子矩阵估计算法;然后,对本征音子矩阵引入低秩约束,采用矩阵的核范数作为矩阵秩的凸近似,通过调节核范数的权重因子以有效控制自适应模型的复杂度;最后,给出一种加速近点梯度算法以求解新算法中引入的带有核范数正则项的数学优化问题。汉语连续语音识别的说话人自适应实验表明,引入低秩约束后,本征音子说话人自适应方法的自适应效果得到了明显提高,在5~50 s的自适应数据条件下,均取得了比最大似然线性回归后接最大后验(MLLR+MAP)自适应更佳的识别效果。
A low-rank constraint eigenphone speaker adaptation method is proposed. Original eigenphone speaker adaptation method performs well when the amount of adaptation data is sufficient. However, it suffers from server overfitting when insufficient amount of adaptation data is provided, possibly resulting in lower performance than that of the unadapted system. Firstly, a simplified estimation alogrithm of the eigenphone matrix is deduced in case of hidden Markov model-Gaussian mixture model (HMM-GMM) based speech recognition system with diagonal covariance matrices. Then, a low-rank constraint is applied to estimation of the eigenphone matrix. The nuclear norm is used as a convex approximation of the rank of a matrix. The weight of the norm is adjusted to control the complexity of the adaptation model. Finally, an accelerated proximal gradient method is adopted to solve the mathematic optimization. Experiments on an Mandarin Chinese continuous speech recognition task show that, the performance of the original eigenphone method is improved remarkably. The new method outperforms the maximum likelihood linear regression followed by maximum a posterriori (MLLR+MAP) methods under 5~50 s adaptation data testing conditions.