在机器译文自动评价中,匹配具有相同语义、不同表达方式的词或短语是其中一个很大的挑战。许多研究工作提出从双语平行语料或可比语料中抽取复述来增强机器译文和人工译文的匹配。然而双语平行语料或可比语料不仅构建成本高,而且对少数语言对难以大量获取。我们提出通过构建词的Markov网络,从目标语言的单语文本中抽取复述的方法,并利用该复述提高机器译文自动评价方法与人工评价方法的相关性。在WMT14 Metrics task上的实验结果表明,我们从单语文本中提取复述方法的性能与从双语平行语料中提取复述方法的性能具有很强的可比性。因此,该文提出的方法可在保证复述质量的同时,降低复述抽取的成本。
It is a challenge to match the different expressions (words or phrases) which have the same meanings in the automatic evaluation of machine translation. Many researchers proposed to enhance the matches between the words in machine translation and in human references by extracting paraphrases from bilingual parallel corpus or comparable corpus. However, the cost of constructing the bilingual parallel corpus or the comparable corpus is high; furthermore, it is difficult to obtain a large corpus between some language pairs. In this paper, the paraphrases are extracted from the monolingual texts in the target language by constructing the Markov networks of words, and applied to improve the correlation between the results of automatic evaluation and the human judgments of machine translation. The experimental results on WMT14 Metrics task showed that the performances of the proposed approach of extracting paraphrase from monolingual text are comparable to that of extracting paraphrase from bilingual parallel corpus.