选取ETSI语音增强系统作为研究对象.该系统使用传统维纳滤波方法,在信噪比较高时降噪性能优秀,但在信噪比较低的情况下,降噪能力弱,对于脉冲噪声无较好抑制.而模拟人耳听觉特性的计算听觉场景分析技术能够比较好地弥补这一缺陷.故在ETSI算法的基础上,结合计算听觉场景分析技术,提出一种新的算法,将维纳滤波器参数估计由原本的Mel域变换到Gammatone域,并进一步利用理想率掩蔽估计对带噪信号进行信噪分离,抑制脉冲噪声.该算法在TIMIT语音库上进行了实验,结果证明,与原算法相比,提出的新算法使听觉质量在低信噪比下提升较大,脉冲噪声抑制亦明显.在低信噪比的情况下,后端语音识别系统的识别率得到提升.
Research on the ETSI speech enhancement system was conducted using traditional Wiener filter for noise reduction, which performed well when signal-noise ratio was high enough. However, when SNR decreased to a certain extent, it failed to suppress pulse noise effectively. Computational auditory scene analysis(CASA) simulating human auditory characteristics could make up for this weakness. Therefore, based on ETSI combined with CASA, a new speech enhancement algorithm was proposed, which performed feature extraction and spectrum estimation in the Gammatone domain rather than the original Mel domain as well as filtered out noise by an ideal ratio mask(IRM). On the noisy subset of the TIMIT corpus, the proposed enhancement achieves higher objective acoustic quality and proven ability to inhibit pulse noise under low SNR conditions compared to the original system. It also obtains an improvement in terms of the reduction of word error rates under low SNR conditions in the back-end speech recognition system.