东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于修正MFCC参数汉语耳语音的话者识别

ISSN号：0469-5097
期刊名称：《南京大学学报：自然科学版》
时间：0
分类：TN912.34[电子电信—通信与信息系统;电子电信—信息与通信工程]
作者机构：[1]南京大学声学研究所,南京210093
相关基金：国家自然科学基金（60272037,60340420325）

关键词：耳语音, 话者识别, MFCC参数, 隐马尔可夫模型, whispered speech, speaker recognition, MFCC, hidden Markov model

中文摘要：

耳语音的话者识别是一个较新的研究课题，许多参数模型与正常音存在差异．例如话者识别中常见的Mel倒谱系数（MFCC）应用于耳语音中就存在共振峰和听觉敏感区域定位的偏差．基于对耳语音共振峰位置、能量以及人耳对耳语音听觉模型的研究提出了修正MFCC参数MFCCM和MFCCExp-Log，并结合两种参数的特点，改进了传统隐马尔可夫模型，建立了适用于耳语音的汉语话者识别系统．通过1600个音的话者识别实验得出采用MFCCM的正确率为88．88％；MFCCExp-Log参数为91．38％；如果采用改进隐马尔可夫模型正确率可以提高到92．31％，均高于传统参数模型．实验表明，修正MFCC参数可以作为表征耳语音特点的参数，它提高了耳语音话者识别系统的识别率．

英文摘要：

Whispered speech is the mode of speech defined as speaking softly with no vibration of the vocal cords to avoid being overheard. The whispering speaker recognition can be applied in several fields, such as the private speech communication in public, the special need for the forensic work, etc. Since speaker identification of whispered speech is the early stage research, many parameters which are used in normal speech are still used. However, some of them are not suitable for whispered speech. For example, the Mel-frequency cepstral coefficients （ MFCC）, which are often used in normal speaker identification, are not suitable for whispered speech because the locations of the formant and the auditory model in whispered speech are different from those in normal speech, In normal speech, the first formant （F1） is located in the range of 200 - 1 000 Hz which is the sensitive frequency band in the auditory mode. But in whispered speech, the frequency of F1 is 1.3 times than that of normal speech, and the sensitive zone of the auditory model occurs in the neighborhood of the second formant （F2）. So a new frequency scale is needed to emphasize mid-frequencies while deemphasizing the lower and higher frequencies. Two modified MFCC （ MFCCM and MFCCExp-Log ） , which are based on the locations, energy of formants and the auditory model in whispered speech, are proposed to resolve this problem. Furthermore, a speaker recognition system in whispered speech is presented based on modified hidden Markov models （HMM） integrating advantages of two modified MFCC. The recognition rates are 88.88% for the MFCCM and 91.38% for the MFCCExp-Log in the test respectively with 1 600 Chinese whispered speeches which are recorded from 10 men and 10 women. The correct rates can be improved to 92.31% if the modified Hidden Markov Models is used. It is more accurate than the traditional method using MFCC and standard HMM. As shown in the experiments, these modified MFCC can be used as the character parameter in the whispered s

同期刊论文项目