本征音子说话人自适应方法在自适应数据量不足时会出现严重的过拟合现象,提出了一种基于稀疏组LASSO约束的本征音子说话人自适应算法。首先给出隐马尔可夫—高斯混合模型下本征音子说话人自适应的基本原理;然后将稀疏组LASSO正则化引入到本征音子说话人自适应,通过调整权重因子控制模型的复杂度,并通过一种加速近点梯度的数学优化算法来实现;最后将稀疏组LASSO约束的自适应算法与当前多种正则化约束的自适应方法进行比较。汉语连续语音识别的说话人自适应实验表明,引入稀疏组LASSO约束后,本征音子说话人自适应方法的性能得到了明显提高,且稀疏组LASSO约束方法优于l1、l2和弹性网正则化方法。
Original eigenphone speaker adaptation method performed well when the amount of adaptation data was sufficient. However, it suffered from server overfitting when insufficient amount of adaptation data was provided. A sparse group LASSO(SGL) constraint eigenphone speaker adaptation method was proposed. Firstly, the principle of eigenphone speaker adaptation was introduced in case of hidden Markov model-Gaussian mixture model(HMM-GMM) based speech recognition system. Then, a sparse group LASSO was applied to estimation of the eigenphone matrix. The weight of the SGL norm was adjusted to control the complexity of the adaptation model. Finally, an accelerated proximal gradient method was adopted to solve the mathematic optimization. The method was compared with up-to-date norm algorithms. Experiments on an mandarin Chinese continuous speech recognition task show that, the performance of the SGL constraint eigenphone method can improve remarkably the performance of the system than original eigenphone method, and is also superior to l1-norm, l2-norm and elastic net constraint methods.