提出了一种基于极大似然的噪声对数功率谱估计方法,采用高斯混合模型对每一个频带上的功率谱包络构建统计模型,将时序包络划分为语音和非语音类,它们分别对应于高斯混合模型的两个高斯分量,描述语音和非语音的统计分布,其中非语音高斯分量的均值即为噪声功率谱的最优估计。采用序贯学习的方法,在极大似然准则下逐帧更新模型参数,并逐帧给出噪声功率谱的最优估计值。此外,由于序贯更新过程中语音信号长时缺失,容易导致模型失稳,提出了一种在线的最小描述长度准则(MDL)来判断语音信号是否长时缺失,从而保证了模型的稳定性。实验表明,算法性能整体优于经典的MS和IMCRA算法。
An approach to estimate the noise logarithmic power was presented based on maximal likelihood. The two-component Gaussian mixture model (GMM) is utilized to describe the distribution of logarithmic power of noisy speech, where one component denotes the speech ("speech+noise') power distribution and the other component denotes the non-speech power distribution. The mean of non-speech component is optimal estimate of noise power. An on-line method is presented to update the parameter set of GMM frame by frame. Due to long-term speech absence, the on- line updation may fail. An on-line minimum description length (MDL) is presented to determine the long-term speechabsence/presence, which enables the model work well under long-term speech absence. The performance of the proposedmethod is evaluated by speech enhancement. The experimental results confirm GMM algorithm outperforms the typicalmethod such as classic MS and IMCRA algorithm.