东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

命名实体识别、排歧和跨语言关联

ISSN号：1003-0077
期刊名称：中文信息学报
时间：0
页码：1-17
语言：中文
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国科学院自动化研究所模式识别国家重点实验室,北京100190
相关基金：基金项目：国家863计划资助项目（2006AA012144）;国家自然科学基金资助项目（60673042,60875041）致谢：感谢与作者合作的研究生们对本文的贡献,特别是吴友政（命名实体识别和分类）、刘非凡（产品名识别、实体同指消解）、杨帆（实体的跨语言关联）、陆敏（实体的跨语言关联）、邹波（实体的音译）、韩先培（实体排歧、实体属性抽取）、刘康（实体属性抽取、命名实体识别系统的自适应）等.感谢国家语言资源监测与研究中心的老师和同学对本文研究工作的支持.
相关项目：汉语依存分析的概率化决策动作模型及自适应技术研究

作者：赵军|

关键词：计算机应用, 中文信息处理, 命名实体识别, 命名实体排歧, 命名实体跨语言关联, computer application, Chinese information processing, named entity reeognition~ named entity disambiguation, named entity cross-lingual coreference resolution

中文摘要：

命名实体是文本中承载信息的重要语言单位，命名实体的识别和分析在网络信息抽取、网络内容管理和知识工程等领域都占有非常重要的地位。有关命名实体的研究任务包括：实体识别、实体排歧、实体跨语言关联、实体属性抽取、实体关系检测等，该文重点介绍命名实体识别、排歧和跨语言关联等任务的研究现状，包括难点、评测、现有方法和技术水平，并对下一步需要重点解决的问题进行分析和讨论。该文认为，命名实体识别、排歧和跨语言关联目前的技术水平还远远不能满足大规模真实应用的需求，需要更加深入的研究。在研究方法上，要突破自然语言文本的限制，直接面向海量、冗余、异构、不规范、合有大量噪声的网页信息处理。

英文摘要：

Named Entities are important meaningful units in texts. The recognition and analysis of named entities is of great significance in the field of Web information extraction, Web content management and knowledge engineering, etc. The research on named entities includes named entity recognition, disambiguation, coreference resolution, attribute extraction and relation detection, etc. Focusing on named entity recognition, disambiguation and crosslingual coreference resolution, the paper gives a thorough survey on the state of the art of these tasks, including the challenges, methods, evaluations, performances and the problems to be solved. The paper suggests that, the performances of the current systems of named entity recognition, disambiguation and cross-lingual coreference resolution are far from the requirement of large-scale practical applications. In the view of methods and approaches, named entity recognition, disambiguation and cross-lingual conference resolution should he carried beyond the natural language texts and should be investigated directly among the large-scale, redundant, heterogeneous, ill-formed and noisy web pages.

同期刊论文项目