大多数基于短语的统计机器翻译系统将任意连续的词串看作短语,并没有考虑短语的合理性。使用了C-value以及短语粘结度两种方法,有效地对短语表进行过滤,减小了搜索空间,同时还提高了翻译质量。实验表明,在翻译结果的BLEU评价提高0.02的情况下,短语表可以缩减为原来的78%。并且当短语表缩减为原来的47.5%时,BLEU评价仍提高了0.0158。
Most phrase-based statistical machine translation systems treat arbitrarily continuous words as phrases without considering their rationality.The paper adopts two methods,C-value and phrase cohesion value,to effectively filter the phrase table,reduce its search space while at the same time ameliorate the translation performance.Experiments show that the phrase table can be reduced to 78% of its size with a 0.02 rise of the BLEU score,or to 47.5% of its size with a 0.0158 rise of the BLEU score.