为了提高说话人识别的准确率,可以同时采用多个特征参数,针对综合特征参数中各维分量对识别结果的影响可能不一样,同等对待并不一定是最优的方案这个问题,提出基于Fisher准则的梅尔频率倒谱系数(MFCC)、线性预测梅尔倒谱系数(LPMFCC)、Teager能量算子倒谱参数(TEOCC)相混合的特征参数提取方法。首先,提取语音信号的MFCC、LPMFCC和TEOCC三种参数;然后,计算MFCC和LPMFCC参数中各维分量的Fisher比,分别选出六个Fisher比高的分量与TEOCC参数组合成混合特征参数;最后,采用TIMIT语音库和NOISEX-92噪声库进行说话人识别实验。仿真实验表明,所提方法与MFCC、LPMFCC、MFCC+LPMFCC、基于Fisher比的梅尔倒谱系数混合特征提取方法以及基于主成分分析(PCA)的特征抽取方法相比,在采用高斯混合模型(GMM)和BP神经网络的平均识别率在纯净语音环境下分别提高了21.65个百分点、18.39个百分点、15.61个百分点、15.01个百分点与22.70个百分点;在30 d B噪声环境下,则分别提升了15.15个百分点、10.81个百分点、8.69个百分点、7.64个百分点与17.76个百分点。实验结果表明,该混合特征参数能够有效提高说话人识别率,且具有更好的鲁棒性。
In order to improve the accuracy of speaker recognition,multiple feature parameters should be adopted simultaneously. For the problem that each dimension comprehensive feature parameter has the different influence on the identification result,and treating them equally may not be the optimal solution,a feature parameter extraction method based on Fisher criterion combined with Mel Frequency Cepstrum Coefficient( MFCC),Linear Prediction Mel Frequency Cepstrum Coefficient( LPMFCC) and Teager Energy Operators Cepstrum Coefficient( TEOCC) was proposed. Firstly,parameters of MFCC,LPMFCC and TEOCC from speech signals were extracted,and then the Fisher ratio of each dimension of MFCC and LPMFCC parameters was calculated,six components were selected respectively by using Fisher standard to combine with TEOCC parameter into a mixture feature which was used to realize speaker recognition on the TIMIT acoustic-phonetic continuous speech corpus and NOISEX-92 noise library. The simulation results show that the average recognition rate of the proposed method by using Gauss Mixed Model( GMM) and Back Propagation( BP) neural network compared with MFCC,LPMFCC,MFCC +LPMFCC,parameter extraction method for MFCC based on Fisher criterion and the feature extraction method based on Principal Component Analysis( PCA) is increased by 21. 65 percentage points,18. 39 percentage points,15. 61 percentage points,15. 01 percentage points,22. 70 percentage points in the pure voice database,and by 15. 15 percentage points,10. 81 percentage points,8. 69 percentage points,7. 64 percentage points,17. 76 percentage points in 30 d B noise environments. The results show that the mixture feature can improve the recognition rate effectively and has better robustness.