针对存在情感差异性语音情况下说话人识别系统性能急剧下降以及缺乏充足情感语音训练说话人模型的问题,提出一种基于基频的情感语音聚类的说话人识别方法,能有效利用系统可获取的少量情感语音.该方法通过对男女说话人设定不同的基频阈值,根据阈值,对倒谱特征进行聚类,为每个说话人建立不同基频区间的模型.在特征匹配时,选用最大似然度的基频区间模型的得分作为该说话人的得分.在中文情感语音库上的测试结果表明,与传统的基于中性训练语音的高斯混合模型说话人识别方法和结构化训练方法相比,该方法具有更高的识别率.
Speech with various emotions aggravates the performance of speaker recognition system. A pitch-dependent affective speech clustering method for speaker modeling is proposed. This method aims to exploiting the affective material effectively in the speaker systems. Thresholds for pitches are determined for the male and the female separately. The cepstral features in the same pitch range are clustered. Different pitch-dependent models are built with the corresponding cluster features by map adaptation for each speaker. The maximum likelihood rule is applied to the matched models and the identification of the person. The proposed method is evaluated on the mandarin affective speech corpus. Experimental results show that the proposed approach is more powerful and efficient than the cepstral feature based method and the structure training method for speaker recognition.