东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

语音识别中基于低秩约束的本征音子说话人自适应方法

ISSN号：1009-5896
期刊名称：电子与信息学报
时间：2014.4.15
页码：981-987
分类：TN912.3[电子电信—通信与信息系统;电子电信—信息与通信工程]
作者机构：[1]解放军信息工程大学信息系统工程学院,郑州450002
相关基金：国家自然科学基金（61175017）和国家863计划项目（2012AA011603）资助课题
相关项目：基于分段条件随机场的连续语音识别技术

关键词：语音识别, 说话人自适应, 本征音子, 低秩约束, 近点梯度法, Speech recognition, Speaker adaptation, Eigenphone, Low-rank constraint, Proximal gradient method

中文摘要：

该文提出一种基于低秩约束的本征音子（Eigenphone）说话人自适应方法。原始的本征音子说话人自适应方法在自适应语料充分时具有很好的效果，然而当自适应语料不足时，出现严重的过拟合现象，导致自适应后的系统可能比自适应前的系统还要差。首先，对协方差矩阵为对角阵的隐马尔可夫-高斯混合模型语音识别系统，推导出一种简化的本征音子矩阵估计算法；然后，对本征音子矩阵引入低秩约束，采用矩阵的核范数作为矩阵秩的凸近似，通过调节核范数的权重因子以有效控制自适应模型的复杂度；最后，给出一种加速近点梯度算法以求解新算法中引入的带有核范数正则项的数学优化问题。汉语连续语音识别的说话人自适应实验表明，引入低秩约束后，本征音子说话人自适应方法的自适应效果得到了明显提高，在5~50 s的自适应数据条件下，均取得了比最大似然线性回归后接最大后验（MLLR＋MAP）自适应更佳的识别效果。

英文摘要：

A low-rank constraint eigenphone speaker adaptation method is proposed. Original eigenphone speaker adaptation method performs well when the amount of adaptation data is sufficient. However, it suffers from server overfitting when insufficient amount of adaptation data is provided, possibly resulting in lower performance than that of the unadapted system. Firstly, a simplified estimation alogrithm of the eigenphone matrix is deduced in case of hidden Markov model-Gaussian mixture model （HMM-GMM） based speech recognition system with diagonal covariance matrices. Then, a low-rank constraint is applied to estimation of the eigenphone matrix. The nuclear norm is used as a convex approximation of the rank of a matrix. The weight of the norm is adjusted to control the complexity of the adaptation model. Finally, an accelerated proximal gradient method is adopted to solve the mathematic optimization. Experiments on an Mandarin Chinese continuous speech recognition task show that, the performance of the original eigenphone method is improved remarkably. The new method outperforms the maximum likelihood linear regression followed by maximum a posterriori （MLLR＋MAP） methods under 5~50 s adaptation data testing conditions.

同期刊论文项目