环境失配问题严重影响着说话人识别的性能,这一问题在非平稳噪音条件下表现得更为显著.为了增强说话人识别在环境失配条件下的鲁棒性,基于稀疏表示提出了一种高维鲁棒语音特征的生成方法,并针对上述高维语音特征的稀疏特性提出了一个说话人模型.在该说话人识别方法中,首先以优化的联合基作为稀疏表示的基,在此基础上对信号进行分解,用于从带噪语音中剥离噪音成分,并从中提取语音信号的内蕴时频结构;之后在此基础上提出了一种鲁棒的稀疏谱语音特征,并根据该特征的高维稀疏特性给出了基于混合k-means的说话人模型.实验结果显示,与基于梅尔倒谱系统特征的基线系统相比,提出的说话人识别方法在NIST SRE-2003语料库条件下的等错误率下降了28.16%,在Chinese-863语料库和不同信噪比(5dB和0dB)的非平稳汽车噪音环境下的等错误率分别下降了9.84%和14.21%.上述结果表明,在环境失配情况下,提出的说话人识别方法的性能明显优于基于梅尔倒谱系数特征的基线系统.
Environmental mismatch problem seriously affects the performance of speaker recognition,especially in non-stationary noise environments.In order to enhance the robustness of speaker recognition,we proposed a method to generate robust sparse spectrum based speech feature,and also proposed a speaker model based on the proposed feature.In our speaker recognition method,firstly,the sparse representation plus optimized joint basis was used in this paper to reduce the noise component from the noisy speech and extract the underlying structures of speech component;then,a sparse spectrum based speech feature was proposed,and a speaker model was given based on the proposed feature.Our experimental results show that,compared with the Mel-frequency cepstral coefficients(MFCC)based speaker recognition,the equal error ration of the proposed feature based speaker recognition is reduced by 28.16% under NIST SRE-2003 corpus,and 9.84% and 14.21% at SNR 5dB and 0dB levels under Chinese-863 corpus,respectively.The results show that the proposed method is more robust than the MFCC based recognition system.