针对说话人识别系统中存在的有效语音特征提取以及噪声影响的问题,提出了一种新的语音特征提取方法——基于S变换的美尔倒谱系数(SMFCC).该方法是在传统美尔倒谱系数(MFCC)的基础上利用S变换的二维时频多分辨率特性,以及奇异值分解(SVD)方法的二维时频矩阵有效去噪性,并结合相关统计分析方法最终获得语音特征.采用TIMIT语音数据库,将所提的特征和现有特征进行对比实验.SMFCC特征的等错误率(EER)和最小检测代价(Min DCF)均小于线性预测倒谱系数(LPCC)、MFCC及其结合方法 LMFCC,比MFCC的EER和Min DCF08分别下降了3.6%与17.9%.实验结果表明所提方法能够有效去除语音信号中的噪声,提升局部分辨率.
Aiming at the problems of effective feature extraction of speech signal and influence of noise in speaker recognition, a novel method called Mel Frequency Cepstral Coefficients based on S-transform( SMFCC) was proposed for speech feature extraction. The speech features were obtained which were based on traditional Mel Frequency Cepstral Coefficients( MFCC), employed the properties of two-dimensional Time-Frequency( TF) multiresolution in S-transform and effective denoising of two-dimensional TF matrix with Singular Value Decomposition( SVD) algorithm, and combined with other related statistic methods. Based on the TIMIT corpus, the extracted features were compared with the current features by the experiment. The Equal Error Rate( EER) and Minimum Detection Cost Function( Min DCF) of SMFCC were smaller than those of Linear Prediction Cepstral Coefficient( LPCC), MFCC, and LMFCC; especially, the EER and Min DCF08 of SMFCC were decreased by 3. 6% and 17. 9% respectively compared to MFCC. The experimental results show that the proposed method can eliminate the noise in the speech signal effectively and improve local speech signal feature resolution.