针对已有的发音字典扩展方法只能从文本数据中学习新词而无法学习到音频数据中新词的问题,提出了一种基于混合语音识别系统的发音字典新词学习方法。该方法首先分别采用音节和字母音素对混合识别系统对音频数据进行集外词识别,利用系统间的互补性得到尽可能多的新词及其发音候选,然后借助感知器与最大熵模型对得到的新词及发音进行优化,降低错误率,最后实现发音字典的扩展,并利用语法语义信息完成对语言模型参数更新。基于华尔街日报(WSJ)语料库的连续语音识别实验表明:该方法可以有效学习到音频数据中的未知新词,采取的数据优化策略极大地提高了所得新词及发音的精度;在词错误率指标下,字典扩展后系统的识别性能相对基线系统提高约13.4%。
A self-learning method of new pronunciation lexicons based on a hybrid speech recognition system is proposed to solve the problem that the existing self-expanding methods of pronunciation lexicons can only learn new words from text data but cannot learn from audio data.The method utilizes both the syllables and the graphones hybrid systems to recognize the out-ofvocabulary words in the audio data and then obtains as many new words with their pronunciations as possible by using the complementary information of the two systems.Then the new word and its pronunciation candidates are optimized using aperceptron model and a maximum entropy model to reduce the error rate.Finally,the lexicon is expanded and the language model parameters are updated by using syntactic and semantic information.Experimental results of continuous speech recognition on Wall Street Journal speech database show that the proposed method learns new words from audio data effectively,and the accuracy is greatly improved by using the data optimization strategies.The extended lexicon system yields a relative gain of13.4% over the base line system in terms of word error rates.