为了在训练样本受限的情况下,提高汉语方言辨识的效果,提出了一种基于AdaBoost的汉语方言辨识新方法.该方法将GMM与语言模型组成的辨识系统看成一组弱分类器,然后对这组弱分类器所得的分类结果进行加权投票,最终决定汉语方言测试语音的所属类别.实验结果表明:增加GMM或弱分类器的个数,可以有效提高系统的辨识效果;测试语音越长,系统辨识效果越好;当训练样本有限的情况下,采用AdBoost方法比采用ANN方法具有更高的辨识率.
In order to improve the performance of Chinese dialect identification under the confined training data, a novel dialect identification method using AdaBoost algorithm is presented. The new method uses the results of a set of "poor" classifiers, which consist of Gaussian mixture model (GMM) and language models, to vote and produce the final decision. According to experimental results, the following conclusions are obtained:The performance of the system can be improved effectively by increasing the number of GMM and the "poor" classifiers. The longer the length of test speech is, the higher the identification accuracy of the system is. Using the AdaBoost method can get higher recognition rate than using artifical neural network (ANN) approach under the restricted training data.