大规模词表连续语音识别系统需要综合各种知识源,如声学模型、语言模型、发音词典等。其中,解码网络是识别引擎的基础,对提高解码器的性能有着至关重要的影响。有效综合这些知识源,构建一个紧致的解码网络,可以有效减少识别时的搜索空间和重复计算,显著提高解码速度。该文针对语音识别的动态解码网络进行研究,提出了词标志(wordend,WE)节点前推算法,结合传统的前后向合并算法,实现了一个基于隐Markov模型状态为网络节点的紧凑动态解码网络。优化后的解码网络的节点数和边数分别是线性词典解码网络的1/4,是开源工具包HDecode的1/2;需要计算语言模型预测分数的节点数为HDecode的1/2。该声学模型基于三音子建模,可方便地移植到其他语种上。
Large vocabulary continuous speech recognition systems (LVCSR) involve various knowledge sources, such as an acoustic model, a language model and a pronunciation dictionary. The decoder network as the basis of the decoder has a critical influence on the decoder performance. By effectively integrating these knowledge sources, a compact decoder network can reduce the search space and avoid repeated computations, which accelerates the recognition speed. This paper describes a compact dynamic decoder network based on hidden Markov model states as the network node, with an efficient word end pushing algorithm for speech recognition. The algorithm combines traditional forward and backward combination algorithms to reduce the number of nodes and edges by a factor of 4 compared to a linear lexical decoder network and with half as many nodes as the well-known open source tool HDecode. The number of nodes needed to calculate the look-ahead score is cut in half. This acoustic model is based on three phonemes so decoder networks can easily be built for other languages.