为了提高基于分帧特征变换方法的稳定性,提出了一种基于分段的区分性特征变换方法.该方法将特征变换当成高维信号的稀疏逼近问题,采用状态绑定的方法训练得到基于域划分的线性变换矩阵(Region Dependent Linear Transform,RDLT)和基于最小音素错误准则均值补偿的特征(mean-offset feature Minimum Phone Error,m-f MPE)变换矩阵,将两者的特征变换矩阵构成过完备的字典;采用强制对齐的方式对语音信号进行分段,以似然度最大化作为目标函数,利用匹配追踪算法对目标函数迭代优化,自动地确定各语音信号段中的变换矩阵及其系数.为保证特征变换的稳定性,在选择变换矩阵过程中引入相关度测量,去除相关的特征基矢量.实验结果表明,相比于传统的RDLT方法,当声学模型分别采用最大似然和区分性准则训练时,识别性能分别可以提高1.63%和2.23%.该方法同时能应用于语音增强和模型区分性训练中.
A discriminative segmental feature transform method is proposed to promote the stability of the frame based method. The feature transform is considered as the sparse high dimensional approximation problem. Firstly, a set of feature transform matrices are estimated by tied-state based training of RDLT ( Region Dependent Linear Transform) and m- fMPE (mean-offset feature Minimum Phone Error), and the transform matrices are integrated into an over-complete diction- ary. Then, the speech signal is segmented through force alignment. Finally, following the matching pursuit to optimize the likelihood objective function iteratively, the transform matrices of each segment are selected from the dictionary and the cor- responding coefficients are automatic determined in the optimization process. Further, to guarantee the stability of the trans- form matrices, a correlation measurement is introduced to remove the correlated basis in the recurrence process. The experi- mental results show that, compared with the traditional RDLT method, when the acoustic model is trained with maximum likelihood and discriminative training criterion separately, the recognition performance can be improved by 1.63% and 2.23% respectively. The method can also be applied to speech enhancement and model discriminative training.