针对众源地理数据中的同名点实体之间存在距离、方向等非一致性偏差,导致匹配困难的问题,该文提出了基于泰森多边形的点实体匹配算法。利用相匹配的点实体数据集其对应的泰森多边形具有较高的对应关系这一特点,将不确定的点与点之间的匹配转化为匹配度更高的对应泰森多边形的匹配。首先统计出被彼此泰森多边形包含的点对,根据点对的距离概率分布,计算出距离阈值作为确认同名实体的条件之一;然后将泰森多边形的位置及形状相似性作为匹配条件二;最后将相似度最高的实体确认为同名实体。通过实验与现有的几种点实体匹配算法进行了比较,结果表明,该算法具有较高的查全率和查准率,且普适性强。
Aiming at the problem that there are some inconsistent deviations of distance and direction exist in corresponding entities from crowd sourcing geographic data,which makes it difficult for entity matching,a matching algorithm of point entities based on Thiessen polygon was proposed in this paper.Because of the higher corresponding relationship of related Thiessen polygons of the matched point entities,the uncertain point entity matching was converted to corresponding Thiessen polygon matching with higher similarity.Firstly,the point pairs contained by mutual Thiessen polygons were counted,and the distance threshold regarded as one condition of confirming corresponding entities was calculated according to distance probability distribution of point pairs;Then,the position similarity and shape similarity of Thiessen polygons were regarded as the second condition of judging corresponding entities;Lastly,those entities with the highest similarity were regarded as the corresponding entities.Comparing with existing matching algorithms,the results showed that this algorithm had higher recall rate,precision and universality.