根据语音学的研究,提出中性时发音相似的说话人,在情感状态下的发音人相似的假设——邻居相似现象,并通过定量和定性的分析验证了该假设,即在音素内容相同的情况下,同一说话人的中性模型和情感模型对应高斯分量的“邻居”基本类似.为了解决说话人情感变化时语音短时特征的分布与中性语音模型存在差异的问题,提出说话人情感模型合成的方法——将开发库中学习到的中性一情感变化规律移植到评测库中,根据说话人的中性模型合成出情感模型.从邻居相似现象的特性出发,根据KL距离选取该说话人中性下若干相似的邻居,根据基于邻居的方法和基于邻居变换的方法,合成出该说话人的情感模型.MASC库上的实验结果表明,该方法的识别准确率比传统的GMM—UBM算法提高了2.81%,与情感属性映射(EAP)方法相比识别率提高了1.3%.
Based on the research on phonetics, the assumption that similar-sounding speakers in neutral condition also sound similar when they change their emotions was proposed, known as Similar Neighbor Phenomenon. Additionally, the qualitative and quantitative analysis was conducted to prove the assumption. The "neighbors" of neutral and emotional model of the similar speaker are almost the same under the identical phonetic event. The emotional model synthesis method was proposed in order to overcome the problem that the distribution of acoustic feature under emotional states was different from that of the neutral speaker model. The method can learn the neutral-emotion transformation rules from the development corpus, and apply them into the evaluation corpus to construct the emotional speaker model from his/her neutral one. From the view of Similar Neighbor Phenomenon, neighbors under neutral were selected by the KL distance. The emotional models were constructed by the neighbors-based transformation method and shift-based transformation method. The experiments carried on MASC showed an identification rate (IR) increase of 2.81% over the GMM-UBM algorithm and 1.3% over the emotional attribute projection (EAP) algorithm.