基于语音的自动人类情感识别是近年来新兴的研究课题,它在人机通信中有广阔的应用前景。分别利用语音的短时和长时特征识别说话者的五种情感状态,即生气、高兴、悲伤、惊奇和一种无情感状态。提出了一种基于基音频率、子带频谱能量与共振峰频率的短时特征矢量和一种反映能量频谱分布及动态的长时特征参数,分别利用隐马尔可夫模型和支持矢量机两种方法进行识别。试验用的情感语音包括一个普通话情感语音库和一个丹麦语情感语音库,试验结果表明使用两类特征参数都可以得到较高的识别率。
Automatic speech emotion recognition is a new research area with a wide range of applications in human-machine interactions. Two kinds of speech features, long-term and short-term features are studied, to classify five emotional states: anger, happiness, sadness, surprise and a neutral state. The proposed short-term feature vector is based on pith, sub-band energy, and the first formant frequency. The long-term features are features reflecting the distribution and dynamics of the energy spectrum. Two classification methods, the hidden Markov model (HMM) and the support vector machine (SVM), are used as classifier respectively. Recognition experiments were conducted on a Chinese emotional speech database and a Danish Emotional Speech (DES) Database. Experiments results indicate that both kinds of features can achieve high recognition rates.