针对传统的Katz方法会出现折扣系数大于1或者无法计算的情况,将Simple Good—Turing中对出现次数对数域的平滑思想用于Katz方法中,结合回退模型,提出一种改进的Katz算法.将该方法应用于基于Lattice的语音识别系统中,分析不同语言学模型对生成的Lattice结构的影响和基于该结构的识别性能的影响.实验表明,应用改进的Katz算法针对访谈节目的识别性能最高可以达到60.90%,优于传统Katz方法.
In traditional Katz approach, the discount coefficients may be greater than 1 or can not be calculated in some serious conditions. The idea of smoothing in log domain of couple occurrence number in simple good-turing is adopted. The modified Katz approach is proposed combined with back-off model. The proposed approach is further applied in speech recognition system based on lattice. The analysis of the effects on the structure and performance of lattice with different language models is given. Experiments show that the modified Katz approach enhances the system performance compared with traditional Katz approach. The best recognition rate achieves 60.90% for the corpus from interview program.