维吾尔语是典型的粘着性语言,其复杂的形态以及众多的词缀影响维吾尔语-汉语词语对齐的质量.本文提出对维吾尔语词进行形态分析并词干与词缀分离,再进行对齐;并根据维吾尔语遵循语音和谐规律的特点,对维吾尔语词缀的变体采用统一的表示方法,使得词缀呈现相同的形式.通过以上方法欲达到抑制维汉词语对齐中数据稀疏现象.本文利用此方法处理了新疆多语种信息技术重点实验室提供的维汉双语语料,再利用GIZA++进行对齐,试验结果表明,此方法对词语对齐效果起到了明显的积极作用,而且对维汉机器翻译的质量也有显著的提高.
Uyghur is an agglutinative language and has vast number of affixes,which has great influence on Uyghur-Chinese word alignment result.To solve this problem,this article proposes a method:represent Uyghur words with their morphological segmentation and use symbolized affixes which classified on phonetic harmony substitute for original forms.After preprocessing with this method,we align Uyghur-Chinese sentences which offered by Xinjiang Multilanguage Key Laboratory with GIZA++.Experimental result shows that this method played an important role on alignment results and improved the performance of translation from Uyghur to Chinese.