说话人识别系统会受到假冒者的攻击,而用高保真录音设备录制的语音蓄意闯入说话人识别系统是常见的攻击方式之一。由于不同设备的信道信息不同,而语音中的静音部分又包含了完整的信道信息,且这部分信息不受说话人和文本等因素的影响,因此可以通过检测语音中静音的信道信息是否相同,来判断语音是否为回放的高保真录音。应用自适应子带谱熵法提取语音中的静音,并通过使用改进后的MFCC作为信道信息特征,建立高斯混合-通用背景模型(GMM-UBM)。实验结果表明,通过加入静音检测模块到说话人识别系统中,系统对假冒语音的等错误率大大降低,使得系统的安全性得到提高。
The speaker recognition system can be attacked by an imposter,while it is one of the common ways that the voice recorded with high fidelity recording equipment deliberately breaks into the speaker recognition system.Since the channel informations from different devices are different,while the voice of silent part also contains complete channel information,and the information in this section is not affected by the speaker and text factors,etc.Therefore,the detected channel information of the voice of silent part is the same or can not judge if the voice is the playback of high fidelity voice recording. The adaptive multi-band spectral entropy was used to extract the voice of silent part,and by using the improved MFCC as the channel information features,we established the Gaussian mixture model-Universal Background Model( GMM-UBM). The experimental results show that by adding silence detection module to the speaker recognition system,the system error rate of counterfeit voice has dropped substantially,so that the security of the system has been improved.