针对传统的硬决策树藏语音合成系统存在泛化性能不强的问题,设计改进一种二进制软决策树算法,实现基于语境因子的藏语音合成模型参数估计。内部节点根据子代节点隶属度进行选取,每个节点可视为基于语境依赖隶属度的模糊集合,将每个语境分配给几个重叠的叶节点,提高模型概括和函数逼近性能;采用最大熵平滑分布进行局部一阶矩和全局二阶矩特征捕捉,实现隐式马尔可夫(HMM)输出概率分布的软决策参数最大似然估计。仿真验证结果表明,所提算法在满足应用实时性要求的前提下,可有效提高藏语音合成效果。
For the poor generalization performance of traditional hard decision tree Tibet speech synthesis system, a binary soft decision tree algorithm for Tibet voice synthesis was designed, which used the contextual factors to estimate the model parame- ters. According to the membership of internal node, the descendant nodes were selected, each node was considered as the con- text-dependent membership fuzzy set, which assigned each context to several overlapping leaf nodes, thereby improving model generalization and function approximation performance. The maximum entropy smooth distribution was used to capture the local first moment and global second order moments, which realized the maximum likelihood estimation of decision parameters of HMM output probability distribution. Results of simulation show that the proposed algorithm meets the real-time requirements and effectively improves the Tibet speech synthesis effects.