重音对提高语音合成系统的自然度、可懂度以及语音识别系统的正确率等方面扮演着非常重要的作用.该文基于大规模韵律标注的语料库,利用声学相关特征及词典语法相关特征对汉语重音进行检测.采用Boosting集成分类回归树对当前音节的声学相关特征以及词典语法相关特征进行建模,Boosting集成分类回归树充分利用了当前音节的特性.同时还对词典语法相关特征采用条件随机场方法建模,条件随机场很好地利用了当前音节的上下文特性.最后,将Boosting集成分类回归树模型和条件随机场模型加权组合获得识别率更高的混合模型.该混合模型克服了Boosting集成分类回归树模型的不足,实现了Boosting集成分类回归树和条件随机场的优势互补.实验结果表明该方法具有较好的分类效果,在ASCCD语料库上能够获得84.82%重音检测正确率.同时,与之前其他人的工作在相同的条件下(相同的训练集和测试集)对比,在正确率方面,该方法分别有4.01%和1.67%的提高.另外,该文中,对英语的重音检测和汉语的重音检测做了对比,并通过特征分析方法从另一个层面验证了一些语言学上的结论.
The stress is important to improve the naturalness, understandability and intelligibili ty of speech synthesis system and the correct rate of automatic speech recognition system. In this paper, we conduct stress detection by using the acoustic, lexical and syntactic features based on large scale prosodic annotation corpus. Boosting classification and regression tree is utilized to model the acoustic, lexical and syntactic features, which adequately utilizes the property of the current syllable. Conditional random fields (CRFs) are utilized to model the lexical and syntactic features, which adequately utilize the contextual property of the current syllable. The combina- tion of boosting classification and regression tree and conditional random fields achieves better classification effect when compared with boosting classification and regression tree model or conditional random fields. The combined model overcomes the efficiency of boosting classification and regression tree model, and realizes the complementarities with the advantages of boosting classification and regression tree and conditional random fields. The experimental results indicate that the proposed method acquires better classification effect, and achieves 84.82% stress detection accuracy rate on ASCCD. Compared with the previous counterpart work in the same conditions (the same training set and testing set), there are 4.01%and 1.67% improvements respectively in terms of the correct rate. In this paper, we also compare the differences and the similarities between Mandarin stress detection and English pitch accent detection. Based on the feature analysis on the large scale prosodic annotation corporus, we also verify some linguistie conclusions in a different way