针对基于语音单一特征提取方法所存在的话者识别准确率较低的问题,提出将话者语音中反映人耳听觉感知特性的MFCC特征和接近心理声学临界频带的1/3倍频程(1/3octave)特征作为话者声音的特征参数,设计话者识别的贝叶斯网络,融合2种声音特征参数,通过贝叶斯网络推理实现话者识别。贝叶斯网络通过学习过程确定已注册话者各声音特征的条件概率。进行话者识别时,贝叶斯网络利用贝叶斯定理及条件独立性假设融合待识别话者声音的MFCC特征和1/3倍频程特征,计算每个已注册话者对输入语音特征矢量的后验概率,根据后验概率的大小实现待识别话者的推断。话者识别实验结果表明:提出的基于声音多特征贝叶斯网络融合的话者识别方法可行有效,识别正确率达到100%。
Aiming at the low recognition accuracy problem of speaker recognition based on voice single feature extrac- tion method,in the voice of speakers, the Mel-frequency cepstrum coefficients (MFCC) feature that reflects human auditory perception characteristics and the one third octave feature that is close to the psychological acoustic critical band are extracted as the feature parameters of speakers' voice;the Bayesian network for speaker recognition is de- signed to fuse the two kinds of voice feature parameters, and finally speaker recognition is achieved with Bayesian network inference. Bayesian network determines the conditional probability for each voice feature of the registered speakers through the learning process. When the speaker recognition is carried out, the Bayesian network fuses the MFCC feature and 1/3 octave feature of the voice of the speaker to be identified by using the Bayes theorem and con- ditional independence assumption, calculates the posteriori probability of each registered speaker with respect to the input voice feature vectors, and realizes the inference of the speaker to be identified according to the values of the posterior probability. The speaker recognition experiment results show that the proposed speaker recognition method based on muhiple voice features Bayesian network fusion in this paper is feasible and effective, and the correct recog- nition rate can reach up to 100%.