东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

自由表述口语语音评测后验概率估计改进方法

ISSN号：1003-0077
期刊名称：《中文信息学报》
时间：0
分类：TP18[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]中国科学技术大学电子工程与信息科学系语音及语言信息处理国家工程实验室,合肥230027
相关基金：国家自然科学基金项目（No.61273264）资助

作者：许苏魁[1], 戴礼荣[1], 魏思[2], 刘庆峰[1,2], 高前勇[2]

关键词：藏语, 连续语音识别, 数据驱动, 深层神经网络(DNN), Tibetan, Continuous Speech Recognition, Data Driven, Deep Neural Networks（DNN）

中文摘要：

文中首次涉及藏语的自然对话风格大词汇电话连续语音识别问题．作为一种少数民族语言，藏语识别面临的最大的困难是数据稀疏问题．文中在基于深层神经网络（DNN）的声学模型建模中，针对数据稀疏的问题，提出采用大语种数据训练好的DNN作为目标模型的初始网络进行模型优化的策略．另外，由于藏语语音学的研究很不完善，人工生成决策树问题集的方式并不可行．针对该问题，文中利用数据驱动的方式自动生成决策树问题集，对三音子隐马尔可夫模型（HMM）进行状态绑定，从而减少需要估计的模型参数．在测试集上，基于混合高斯模型（GMM）声学建模的藏字识别率为30．86％．在基于DNN的声学模型建模中，采用三种大语种数据训练好的DNN网络作为初始网络，并在测试集上验证该方法的有效性，藏字识别正确率达到43．26％．

英文摘要：

Large vocabulary continuous speech recognition on telephonic conversational Tibetan is firstly addressed in this paper. As a minority language , the major difficulty in Tibetan speech recognition is data deficiency. In this paper, the acoustic model of Tibetan is trained based on deep neural networks （DNN）. To address the issue of data deficiencies, the DNN models of other majority languages are used as the initial networks of the objective Tibetan DNN model. In addition, phonetic questions of Tibetan generated by phonetic expert are unavailable due to the lacking knowledge of phonetics. To reduce the number of tri-phone hidden Markov models （HMM） in Tibetan speech recognition, phonetic questions automatically generated in the data driven manner are used for tying the tri-phone HMM. In this paper, different clustering of tri-phone states is tested and the words accuracy is about 30. 86% on the test corpus by Gaussian mixture model （GMM）. When the acoustic model is trained based on DNN, 3 kinds of DNN model trained by different large corpus are adopted. The experimental results show that the proposed methods can improve the reeogn/tion performance, and the words accuracy is about 43.26% on the test corpus.

同期刊论文项目

语音信号声纹信息成分的深层表达

期刊论文 4

同项目期刊论文

说话人确认中以音素为中心的特征端因子分析

基于深层置信网络的说话人信息提取方法

深度语音信号与信息处理：研究进展与展望

期刊信息

《中文信息学报》
北大核心期刊（2011版）

主管单位:中国科学技术协会
主办单位:中国中文信息学会中国科学院软件研究所
主编：孙茂松
地址：北京海淀中关村南四街4号中科院软件所
邮编：100190
邮箱：jcip@iscas.ac.cn
电话：010-62562916

国际标准刊号：ISSN：1003-0077
国内统一刊号：ISSN：11-2325/N
邮发代号:

获奖情况:

国内外数据库收录:
日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:9136