针对关键词检测系统中HMM模型框架下置信度计算存在的不足,本文提出了基于MLP帧级子词后验概率的置信度方法。与HMM模型框架下利用声学模型得分与语言模型得分进行置信度计算不同的是,该方法在MLP模型框架下直接将其输出的每帧语音类别的后验概率用于关键词置信度的计算,克服了HMM建模时假设每帧语音的声学特征相互独立以及对状态建模时采用有限混元的高斯分布的不足。关键词检出和置信度确认使用两套不同的模型结构,是两个完全独立的过程,便于融合其他的置信度特征。实验结果表明,本文提出的方法优于HMM框架下主流的置信度计算方法,且与其具有较好的互补性。因此本文将两种不同框架下不同的置信度方法进行融合,系统的等错误率(EER)相对提高了11.5%。
As the confidence measures in the scheme of Hidden Markov Model (HMM) in keyword spotting system have some shortcomings, a confidence measure based on frame-level sub-word posterior probability of Multi-layer Perception (MLP) is presented in this paper. Conventionally, the confidence is calculated from the acoustic and language model scores computed by the recogniser of HMM model, which makes some incorrect assumptions, such as the frame-wise and possibly component-wise independence of acoustic features, and a finite number of Gaussian mixtures. The proposed confidence measure is directly calculated from the frame-level sub-word posterior probabilities produced by a MLP network. The confi- dence estimation is completely separated from the keyword spotting and they use two different models. With this separation, decision making can be addressed with more reliable confidence and multiple confidence features can be integrated to im- prove the decision quality. The experimental results show that the proposed approach in this paper is better than the main- stream confidence measures in the framework of HMM model and they have good complement, when combining with the ma- instream confidence measures in the scheme of HMM model, the Equal Error Rate (EER) of keyword spotting system a- chieves 11.5% relative improvement.