针对由于数据的稀疏性和双语数据规模的局限性造成的大量高质量短语对没有生成的问题,在基于短语的统计机器翻译系统中,通过对传统短语抽取算法抽取的短语对进行分解、替换、生成等操作,生成传统方法无法抽取的实例短语对。在汉英新闻和汉英口语翻译任务上,与基线系统相比,该方法在多个测试集上明显提高了翻译系统的翻译质量,在部分测试集上BLEU值可提高1%左右。
Due to the sparsity of data and the limitation of bilingual data size, many high-quality phrase pairs can't be generated. The example-based phrase pairs proposed by the authors are generated through decomposing, substituting and generating the typical phrase pairs, and the typical phrase pairs are generated by the typical phrase extraction method in phrase-based statistical machine translation. On the Chinese-to-English Newswire and Oral translation tasks, the experimental results demonstrate significant improvements achieved by the proposed methods. Moreover, a gain of about 1% BLEU score increase is yielded on some test sets.