实体链接任务的目标是将从文本中抽取得到的实体指称项正确地链接到知识库中的对应实体对象上.当前主流的实体链接算法大致可分为2类:基于上下文相似度的实体链接算法和基于图的集成实体链接算法.这2类算法各自存在一些优点和不足.前者有利于从上下文语义的角度对实体进行区分,但难以充分利用知识库中已有的知识体系辅助决策;后者能够更好地利用知识库中实体间的语义关联关系,但在上下文信息不充分的情况下,较难区分概念相近的实体.提出一种基于语义一致性的集成实体链接算法,该算法能够更好地利用知识库中实体间的结构化语义关系,帮助提高算法对概念相似实体的区分度,实验结果表明:该算法能够有效提高实体链接结果的准确率和召回率,性能显著优于当前的主流算法,在对长、短文本的实体链接任务中性能表现稳定,具有良好的适应性和可推广性.
The goal of entity linking is to link entity mentions in the document to their corresponding entity in a knowledge base.The prevalent approaches can be divided into two categories:the similarity-based approaches and the graph-based collective approaches.Each of them has some pros and cons.The similarity-based approaches are good at distinguish entities from the semantic perspective,but usually suffer from the disadvantage of ignoring relationship between entities;while the graph-based approaches can make better use of the relation between entities,but usually suffer from bad discrimination on similar entities.In this work,we present a consistent collective entity linking algorithm that can take full advantage of the structured relationship between entities contained in the knowledge base,to improve the discrimination capability of the proposed algorithm on similar entities.We extensively evaluate the performance of our method on two public datasets,and the experimental results show that our method can be effective at promoting the precision and recall of the entity linking results.The overall performance of the proposed algorithm significantly outperform other state-of-the-art algorithms.