正确标记短语间的停顿,对提高文语转换系统合成语音的自然度起着重要作用。介绍一种采用最大熵模型从真实自然的语音流中自动识别汉语短语问停顿的方法。模型的特征集包含语音和词法两类特征,采用半自动的方式获得。首先由人工根据经验设计候选特征集,然后采用特征选择算法对候选特征进行筛选,选择更有效的特征构成最终特征集,并训练生成用于汉语短语间停顿识别的最大熵模型。3组实验的结果表明,模型能够取得比较满意的短语间停顿识别效果。
In TTS system,it is very important to mark phrase breaks correctly for high naturalness and quality of output speech. This paper presents a maximum entropy based model for phrase break identification in Chinese sentence.The characteristics for model can be divided into two different types,acoustic characteristics and linguistic characteristics.The characteristic set is acquired through a semiautomatic method.Firstly,design spare characteristics based experience;and then it uses an automatic arithmetic to pick out effective characteristics and build final characteristic set;and then trains and builds maximum entropy model based on the set.The experiment results show that the maximum entropy: model can acquire satisfactory effect.