东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

深度神经网络自适应中基于身份认证向量的归一化方法

ISSN号：2095-6134
期刊名称：《中国科学院大学学报》
时间：0
分类：TN912.3[电子电信—通信与信息系统;电子电信—信息与通信工程]
作者机构：[1]清华大学电子工程系,北京100084, [2]马凯特大学电气与计算机工程系,密尔沃基53233
相关基金：国家自然科学基金资助项目（61370034,61273268,61403224）.

关键词：语音增强, 计算听觉场景分析, Gammatone滤波器, 理想率掩蔽, speech enhancement, computational auditory scene analysis, Gammatone filter, idea ratio mask

中文摘要：

选取ETSI语音增强系统作为研究对象.该系统使用传统维纳滤波方法,在信噪比较高时降噪性能优秀,但在信噪比较低的情况下,降噪能力弱,对于脉冲噪声无较好抑制.而模拟人耳听觉特性的计算听觉场景分析技术能够比较好地弥补这一缺陷.故在ETSI算法的基础上,结合计算听觉场景分析技术,提出一种新的算法,将维纳滤波器参数估计由原本的Mel域变换到Gammatone域,并进一步利用理想率掩蔽估计对带噪信号进行信噪分离,抑制脉冲噪声.该算法在TIMIT语音库上进行了实验,结果证明,与原算法相比,提出的新算法使听觉质量在低信噪比下提升较大,脉冲噪声抑制亦明显.在低信噪比的情况下,后端语音识别系统的识别率得到提升.

英文摘要：

Research on the ETSI speech enhancement system was conducted using traditional Wiener filter for noise reduction, which performed well when signal-noise ratio was high enough. However, when SNR decreased to a certain extent, it failed to suppress pulse noise effectively. Computational auditory scene analysis（CASA） simulating human auditory characteristics could make up for this weakness. Therefore, based on ETSI combined with CASA, a new speech enhancement algorithm was proposed, which performed feature extraction and spectrum estimation in the Gammatone domain rather than the original Mel domain as well as filtered out noise by an ideal ratio mask（IRM）. On the noisy subset of the TIMIT corpus, the proposed enhancement achieves higher objective acoustic quality and proven ability to inhibit pulse noise under low SNR conditions compared to the original system. It also obtains an improvement in terms of the reduction of word error rates under low SNR conditions in the back-end speech recognition system.

同期刊论文项目