在对当前几种较流行的统计机器翻译多系统融合方法分析的基础上,提出了一种改进的多系统融合框架,该框架集成了最小贝叶斯风险解码和多特征混淆网络解码两种技术。融合过程如下:(1)从多个翻译系统输出的-best结果中,利用最小贝叶斯风险解码器选择一个风险最小的假设作为对齐参考;(2)将其余的-best假设结果与该参考对齐,从而构建混淆网络。多特征混淆网络基于对数线性模型,引入了更多有效的知识源参与最优路径选择,融合后的BLEU得分比融合前最好的单系统BLEU得分提高了2.19%。在对齐方法上,我们提出了一种改进的翻译错误率(Translation Error Rate,TER)准则——GIZA-TER准则,该准则可以对CN网络进行更有效的短语调序。实验中的显著性检验证明了本文方法的有效性。
Based on several popular methods of statistical machine translation combination, an improved multiple system combination framework is proposed. This framework integrates Minimum Bayes Risk (MBR) decoding and multi-feature Confusion Network (CN) decoding techniques with the following steps: (1)MBR decoding technique is used to select the hypothesis with minimum risk as an alignment reference from several N-best results produced by translation systems ; (2)CN is constructed by aligning the other hypotheses with the reference. Based on log linear model, the CN introduces more knowledge sources into the selection of optimal path. Compared with the best system without combination, the proposed framework has 2.19% improvement in BLEU score. In: addition, we present a modified Translation Edit Rate (TER)——GIZA-TER metric for CN alignment, which facilitates a more effec rive phrase re-ordering. The significance tests demonstrate the validness of the proposed methods.