为了提高语音识别准确率,提出了一种子空间域相关特征变换与融合的语音识别方法(MFCC-BN-TC方法)。该方法提取语音短时谱结构特征(BN)和包络特征(MFCC)分别描述语音短时谱结构和包络信息,并采用域相关特征变换的形式分别对BN和MFCC特征进行特征变换;然后对这种变换进行泛化扩展提出子空间域相关特征变换,以采用不同的时间颗粒度(帧和语音分段)进行多层次区分性特征表达;最后,对多种区分性特征变换后的特征进行联合表征训练声学模型,并给出了区分性特征变换与融合的一般框架。实验结果表明:MFCC-BN-TC方法比采用原始BN特征方法和采用MFCC特征基线系统方法,识别性能各自提高了0.98%和1.62%;融合MFCCBN-TC方法变换以后的语音信号特征,相比于融合原始特征,识别率提升了1.5%。
A speech recognition method based on dependent feature transformation and combination of subspace regions(MFCC-BN-TC)is proposed to improve the recognition accuracy.The structure feature(BN)and envelope feature(MFCC)are extracted to separately describe the structure and envelope information of the short speech spectrum,and the region dependent feature transformation is adopted to perform feature transformation for the BN and the MFCC,respectively.The transformation is then generalized to give a subspace region-dependent feature transformation so that different time units(frame and segment)are applied to finish multi-level modeling.Moreover,a feature combination framework is proposed,and the acoustic model is trained using combined multi-features after transformation.Experimental results and comparisons with the method using raw BN and the method based on MFCC feature show that the recognition rate of the MFCC-BN-TC method increases by 0.96% and 1.62%,respectively.The gain in performance of the MFCC-BN-TC method increases by 1.5% through combining the transformed features.