短语对抽取是基于短语统计机器翻译方法的关键技术。当前广泛使用的Och提出的短语对抽取方法,过于依赖词对齐结果,因而只能抽取与词对齐完全相容的短语对。本文给出一种基于“松弛尺度”的短语抽取方法,对不能完全相容的短语对,结合词性标注信息和词典信息来判断是否进行抽取,放松“完全相容”的限制,可以保证为更多的源短语找到目标短语。实验表明,该抽取方法的性能比Och的方法有明显的改善和提高。
The phrase translation pair extractions is one of the key techniques in the Phrase-based Statistical Machine Translation. Och's phrase extraction method heavily depends on word alignments, so only the phrase pairs which are fully consistent with the word alignments are extracted. This paper proposes a method of phrase pair extraction with a flexible scale. This method can extract those phrase alignments which Och's method can not obtained. The flexible scale is based on the two features: POS and dictionary information. Our experiments have shown that our method outperforms Och's method significantly.