东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种命名实体翻译等价对的抽取方法

ISSN号：1003-0077
期刊名称：中文信息学报
时间：0
页码：172-175
语言：中文
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]南京大学计算机软件新技术国家重点实验室,江苏南京210093
相关基金：国家863计划资助项目（2006AA01Z143,2006AA01Z139）;国家自然科学基金资助项目（60673043）;江苏省自然科学基金资助项目（BK2006117）
相关项目：基于统计关系学习的汉语指代消解研究

关键词：人工智能, 机器翻译, 命名实体, 翻译等价对, HMM, 对齐模型, artificial intelligence, machine translation, named entity, translingual equivalence, HMM, alignment model

中文摘要：

有关命名实体的翻译等价对在多语言处理中有着非常重要的意义。在过去的几年里，双语字典查找，音译模型等方法先后被提出。另一种极具价值的方法是从平行语料库中自动抽取有关命名实体的翻译等价对，现有的方法要求预先对双语语料库的两种语言文本进行命名实体标注。提出了一种只要求对语料库中源语言进行命名实体标注，目标语言不需标注，然后利用训练得到的HMM词对齐结果来抽取有关命名实体翻译等价对的方法。在实验中，把中文作为源语言，英文作为目标语言。实验结果表明用该方法，即使在对齐模型只是部分准确的情况下，也得到了较高正确率的命名实体翻译等价对。

英文摘要：

Identification of translingual equivalence of named entities is substantial to multilingual natural language processing. Some approaches to named entity translation, such as bilingual dictionary lookup, word/sub-word translation or transliteration, have been explored in the past years. Another promising approach is to extract named entity translingual equivalence automatically from a parallel corpus, which usually requires the named entities to be annotated manually or automatically for both languages. In this paper, we propose a new approach to extract equivalence of named entities from a parallel corpus with only the source language annotation and the result of HMM alignment. The experiment is carried in a Chinese-English parallel copus, and we treat Chinese as the source language and English as the target language. The result shows that our new approach achieves high quality of named entity pairs with relatively high precision, even though sometimes the word alignment result is partially correct.

同期刊论文项目