文中提出一种基于图的、迭代的联合式实体识别方法.初始时,将多类型的、关联的实体数据对象集合构建实体数据对象关系图,将基于语义路径的相似度和属性相似度结合起来判断数据对象是否匹配;然后,合并匹配成功的数据对象,并对对象图中的相应数据对象结点及其周边执行局部图收缩,这两个操作使对象图的局部语义变得更丰富,促使该局部范围内产生出新的候选匹配对象对,以待后续识别,实现相似度传递,形成一个迭代的识别过程.随着不断迭代,对象图的语义不断丰富,提高了联合式实体识别的准确性.通过实验证明文中提出的方法比已有的联合式实体识别方法和基于对象关系的单类型实体识别方法具有更高的准确性.
We propose a graph-based iterative joint entity resolution approach. To start off, an entity data object relationship graph is built from the input dataset consisting of multiple classes of related data objects. It hires a hybrid similarity, combining a structure similarity based on semantic paths and an attribute-based similarity, to decide whether two data objects match. Then it merges the matched pair and contracts the neighborhood of the merged pair, which leads to enrichment of semantics of the neighborhood. Enrichment of semantics may help generate some new candidate data object pairs in the neighborhood, which will be resolved later. Generation of new candidate data object pairs is called similarity propagation, making it an iterative process. With the iterative process going on, semantics of the object graph becomes richer and richer, promoting accuracy of entity resolution. The experimental evaluation proves that the proposed approach outperforms existing joint entity resolution approaches and relationship-based single class entity resolution approaches in accuracy.