针对基于短语的统计机器翻译(SMT)模型中由于采用精确匹配策略导致的短语稀疏问题,提出了一种基于短语相似度的统计机器翻译模型。该模型将基于实例的翻译方法引入到统计机器翻译中。翻译时,对于训练语料库中未出现过的短语,通过计算源语言短语之间的相似度,采用模糊匹配策略从短语表中查找相似的实例短语,并根据实例短语为其构造翻译。与精确匹配策略相比,利用相似度进行模糊匹配增加了对短语表的利用程度,缓解了短语稀疏问题。实验表明,该模型能够明显地提高统计机器翻译的质量。效果超过了当前最好的短语系统“摩西(Moses)”。
In consideration of the phrase sparseness problem caused by the exact matching strategy in phrase-based statistical machine translation (SMT) models, the paper presents a phrase similarity-based SMT model. The model introduces the example-based method into SMT. During decoding, when facing source phrases which do not appear in the training corpus, the model firstly computes the similarity between source phrases and finds similar examples from the phrase table by fuzzy matching. Then the model produces translations for these source phrases according to the examples. Compared to the exact matching strategy, fuzzy matching can increase the utilization rate of the phrase table, and to some extent, solves the problem of phrase sparseness. The experiments show that the phrase similarity-based model outperforms the state-of-the-art phrase-based SMT system "Moses" and achieves significant improvements.