有关命名实体的翻译等价对在多语言处理中有着非常重要的意义。在过去的几年里,双语字典查找,音译模型等方法先后被提出。另一种极具价值的方法是从平行语料库中自动抽取有关命名实体的翻译等价对,现有的方法要求预先对双语语料库的两种语言文本进行命名实体标注。提出了一种只要求对语料库中源语言进行命名实体标注,目标语言不需标注,然后利用训练得到的HMM词对齐结果来抽取有关命名实体翻译等价对的方法。在实验中,把中文作为源语言,英文作为目标语言。实验结果表明用该方法,即使在对齐模型只是部分准确的情况下,也得到了较高正确率的命名实体翻译等价对。
Identification of translingual equivalence of named entities is substantial to multilingual natural language processing. Some approaches to named entity translation, such as bilingual dictionary lookup, word/sub-word translation or transliteration, have been explored in the past years. Another promising approach is to extract named entity translingual equivalence automatically from a parallel corpus, which usually requires the named entities to be annotated manually or automatically for both languages. In this paper, we propose a new approach to extract equivalence of named entities from a parallel corpus with only the source language annotation and the result of HMM alignment. The experiment is carried in a Chinese-English parallel copus, and we treat Chinese as the source language and English as the target language. The result shows that our new approach achieves high quality of named entity pairs with relatively high precision, even though sometimes the word alignment result is partially correct.