针对基于模型的语音/非语音检测方法鲁棒性不强的问题,提出了一种层次化检测方法。该方法第一层对测试数据进行粗分类;第二层利用粗分类结果,首先根据高过零率比率(High Zero Crossing RateRatio,HZCRR)和短时能量(Short Time Energy,STE)特征选取数据建立静音和可听非语音初始模型,然后训练自适应检测模型,最后利用贝叶斯信息准则(Bayesian Information Criterion,BIC)对结果进行修正。实验结果表明:与基于模型的方法相比,能够适应各种测试数据且检测精度更高、鲁棒性更强。
To solve the problem that model-based speech/non-speech detection is not robust enough, a hierarchical detection method is proposed. In the first layer, the test data is roughly divided into two classes. In the second layer, three steps are taken. Firstly, based on the result of the first layer, some reliable data selected by measuring the High Zero Crossing Rate Ratio(HZCRR) and Short Time Energy(STE) is used to establish the initial silence model and audible non-speech model. Secondly, three adaptive detection models are trained iteratively. Finally the results are corrected using Bayesian Information Criterion(BIC). The result of experiments indicates that this hierarchical method is not only adaptive to a variety of test data but also more accurate and robust compared with model-based methods.