为了消除文本中命名实体的歧义,提出了一种结合实体链接与实体聚类的命名实体消歧算法,结合2种方法,可弥补单独使用其中一种方法的局限.该算法在背景文本中将待消歧实体指称扩充为全称,使用扩充后的全称在英文维基百科知识库中生成候选实体集合,同时提取多种特征对候选实体集合进行排序,对于知识库中没有对应实体的指称使用聚类消歧.实验结果表明,该算法在KBP2011评测数据上的F值为0.746,在KBP2012评测数据上的F值为0.670.
In order to eliminate the ambiguity of named entities in the documents, a named entity disam- biguation algorithm combining entity linking and entity clustering is proposed, and the proposed algorithm combines two methods to compensate for the limitations of only using one of the methods. The proposed algorithm expands the mentions in the background document firstly, and generates candidates in the Eng- lish Wikipedia knowledge base for expansions secondly, then extracts a variety of features to rank candi- dates, lastly uses clustering to disambiguate the mentions which has none candidates in the knowledge base. The experimental results show that, in the proposed algorithm, the F measure in KBP2011 data set is 0. 746 and the F measure in KBP2012 data set is 0. 670.