提出了一种混合算法对齐汉维句子,不需要汉语分词、词性标注预处理,利用双语语料的词汇共现信息,自动抽取汉维语词汇搭配,作为基于词汇对齐的词典,并结合基于长度的方法进行句子对齐,实验结果验证了该混合算法的有效性,汉维语句子对齐的正确率和召回率,达到了97.5%和97.1%。
This paper proposes a new approach to align Chinese-Uyhur sentences in the parallel texts.This approach avoids complicated Chinese processing further, such as segmentation and part of speech tagging.The lexical correspondence information is extracted from the bilingual corpora and used as the lexicon of lexicon-method model, combined with length-based approach, the hybrid approach improves the alignment accuracy and recall,and gets an encouraging 97.5% precision and 97.1% recall.