在汉蒙词对齐任务的背景下,针对数词这一特殊的词类,提出了以阿拉伯数字作为转换中介的对齐方法。其基本思想是,将汉蒙文本中的数词分别转换为阿拉伯数字后再决定它们之间是否存在对应关系,即是否可以成为一个对齐连接。转换过程建立在汉蒙基本数词与阿拉伯数字信息对应表——numtable的基础上。numtable中设置了“标志位”信息,用以标明基本数词是否为“10”的倍数。数词到阿拉伯数字的转换模块依据numtable中的“标志位”信息,经过推理,将汉蒙数词分别转换为相应的阿拉伯数字。在实现转换时,不仅考虑了汉语数词和蒙古语数词各自的特征,而且以“词对齐”作为目标,对每一种语言内部不同类型的数词分别采取了不同的转换策略。
In the context of dictionary based Chinese Mongolian word alignment task, this paper proposed an approach to align Chinese Mongolian numerals. The main idea of the approach is to transform Chinese and Mongolian numerals into the Arabic numerals respectively before calculating their similarities and aligning them. The transformation is based on a two dimensional list called numtable, which includes the basic Chinese (or Mongolian) numerals and the corresponding Arabic numerals. Each numeral in the numtable is assigned an indicative value of "1 " or "2", indicating whether the numeral in question is a multiple of "10". According to the indicative value, the transformation module carried out a series of inference and transformed the Chinese and Mongolian numerals into the Arabic numerals respectively. In the whole transforming process, not only the divergence between Chinese and Mongolian numerals were taken into account, but the different types of numerals in each language were handled differently.