在语音情感识别技术中,由于噪声环境、说话方式和说话人特质原因,造成特征向量空间分布不匹配的情况。从语音学上分析,该问题多存在于跨数据库情感识别实验。训练的声学模型和用于测试的语句样本之间的错位,会使语音情感识别性能剧烈下降。语谱图的特征能从图像的角度对现有情感特征进行有效的补充。本文据此所研究的听觉选择性注意模型,模拟人耳听觉特性,能有效探测语谱图上变化的情感特征。同时,利用时频原子对模型进行改进,取得频率特性信号匹配的优势,从时域上提取情感信息。选择注意机制使模型能提取跨语音数据库中的显著性特征,提高语音情感识别系统的情感辨识能力。实验结果表明,利用文章所提方法在跨库情感样本上进行特征提取,再通过典型的分类器,识别性能提高了约9个百分点,从而验证了该方法对不同数据库具有更好的鲁棒性。
When there exists mismatch between the trained acoustic models and the test utterances due to noise conditions, speaking styles and speaker traits, unmatched features may appear in cross-corpus. The resulting is the drastic degression in the performance of speech emotion recognition. Hence, the auditory attention model is found to be very effective for vari- ational emotion features detection in our work. The auditory model of selective attention simulating to the human ear hearing characteristics, can effectively detect the changes of emotional features in spectrogram. Meanwhile, Chirplet has been adopted to obtain the advantages of frequency characteristic matching signals and extract emotional information from the time domain. Selective attention mechanism model can extract the salient gist features which show their relation to the expected performance in cross-corpus testing. In our experimental results, the prototypical classifier with the proposed feature extrac- tion approach can deliver a gain of up to about 9% accuracy in cross-corpus speech recognition, which is observed insensi- tive to different databases.