提出了一种基于声韵母能量分布和共振峰结构特性的汉语连续语音声韵母边界检测方法。该方法首先将语音经过Seneft听觉感知模型得到听觉谱,然后基于听觉谱,选取全频带能量、低频带能量、谱重心、高低频能量比、中高频能量等特征参数对各声韵母类别能量分布和共振峰结构特性进行描述,最后根据特征参数变化剧烈的点确定出声韵母边界,并采用包络的一阶差分和基于样点的Kullback—Leibler距离对得到的边界进行修正。实验结果表明,对8kHz采样的语音边界检测准确率可达到93.7%;信噪比10dB的语音边界检测准确率可达到85.3%以上;经过参数编码后语音边界检测准确率可达86.7%以上。
A boundary detection method of Chinese initials and finals is proposed based on the energy distribute and formant structure characteristics. According to this method, the auditory spectrum is first of all got after speech signal passes the Seneff's auditory model, and then based on the spectrum the parameters of all-band energy, low-band energy, spectrum center of gravity, ratio of high and low frequency energy, middle and high energy, etc are chose to describe the energy distribute and formant structure characteristic of different kinds of Chinese initials and finals. Finally, tile boundary is determined according to the parameter mutation, and modified using the first envelope difference and simplebased Kullback-Leibler distance. The experimental results show that under 8 kHz sampling frequency, the accuracy is 93.7% for clean speech, above 85.3% for noisy speech with the SNR of 10 dB and above 86.7% for codec speech.