根据测试集中词发生次数调整候选关键词置信度得分,提出一种新的基于ATWV(actual term-weighted value)优化的词相关置信度规整算法。针对ATWV优化计算中存在的置信度偏差问题,分别进行偏差线性补偿和区分性补偿,其中线性补偿通过添加加权和平移系数,以线性方式调整置信度得分;区分性补偿则通过区分性模型训练,将置信度转化为满足ATWV计算要求的正确分类概率,降低置信度偏差带来的影响。基于英文WSJ语料库的关键词识别实验表明,新的置信度规整方法可显著提高系统识别性能。
This paper propose a novel term-dependent confidence normalization method based on ATWV( Actual Term-Weighted Value) optimization,where the words' confidence score is adjusted according to their frequency in the test. For the confidence bias in the ATWV optimization,we propose a linear compensation and a discriminative compensation. The linear compensation adjusts confidence in a linear way by adding weighted and translation factors,while the discriminative compensation converts confidence score to classification posterior probability,which meets the requirements of ATWV optimization,by discriminative model training. Experimental results based on WSJ Speech Corpora show that the novel confidence normalization measures can greatly improve the performance of system.