近几年来,基于短语的统计翻译模型在机器翻译研究中受到普遍关注,并取得了较好的翻译性能。但是,由于目前基于短语的翻译系统在解码时采用精确匹配的策略,常常导致数据稀疏,一方面,有些短语在训练获得的短语表中找不到精确的匹配,使其成为未知短语;另一方面,短语表中大量的短语无法得到充分的利用。为此,我们提出了基于短语模糊匹配和句子扩展的翻译方法。对于不存在于短语表中的短语,通过模糊匹配的办法,寻找与其相似的短语,然后将所有相似短语用于替换原短语,从而生成扩展句子,在此基础上对所有扩展的句子进行翻译。由于并不是所有扩展后的句子都能提高原始句子的翻译效果,因此,我们在句子翻译完成后设置了组合分类器用于选择最优翻译结果。实验证明,这种方法可以有效地提高翻译系统的译文质量。
In recent years, the phrase based statistical machine translation model has obtained more attention for its good translation performance. However, the model uses the strategy of precise matching in decoding, and the data sparseness becomes a serious problem. On the one hand, some phrases become the "unknown phrases" because they cannot be matched precisely in the phrase table; On the other hand, most of the phrases in the phrase table can't be used in the translation process. Therefore, we propose a novel translation approach based on phrase fuzzy matching and sentence expansion. In our approach, for a phrase out of the phrase table, i.e. unknown phrase, we find its similar phrase in the phrase table through fuzzy matching. Then the sentence is expanded by replacing the original phrase with the similar ones before being translated into the target language. Finally, a combination of multi-classifier is employed to select the best translation. The experiment results show that this approach significantly improves the translation quality.