分数阶Fourier变换在处理非平稳信号尤其是chirp信号方面有着独特的优势,而人耳听觉系统具有自动语音识别系统难以比拟的优良性能。本文采用Gammatone听觉滤波器组对语音信号进行前端时域滤波,然后对输出的各个子带信号用分数阶Fourier变换方法提取声学特征。分数阶Fourier变换的阶数对其性能有着重要影响,本文针对子带时域信号提出了采用瞬时频率曲线拟合求取阶数的方法,并将其与采用模糊函数的方法作了比较。在干净与含噪汉语孤立数字库上的语音识别结果表明,采用新提出的声学特征得到的识别正确率相对MFCC基线系统有了显著提高;根据瞬时频率曲线搜索阶数的算法与模糊函数方法相比,计算量大大减少,并且根据该方法提取的声学特征得到了最高的平均识别正确率。
It is well known that auditory system of human beings has excellent performance with which automatic speech recognition (ASR) systems can't match, and fractional Fourier transform (FrFT) has unique advantages in nonstationary signal processing. In this paper, the Gammatone filterbank is applied to speech signals for front-end temporal filtering, and then acoustic features of the output subband signals are extracted based on fractional Fourier transform. The transform order is critical for FrFT. An order adaptation method based on the instantaneous frequency is proposed, and its performance is compared with the method based on ambiguity function. ASR experiments are conducted on clean and noisy Mandarin digits, and the results show that the proposed features achieve significantly higher recognition rate than the MFCC baseline, and the order adaptation method based on instantaneous frequency has much lower complexity than that based on ambiguity function. Further more, the FrFT-based features achieve the highest recognition rate using the proposed order adaptation method.