kNN连接是空间数据库领域里一个基本而又重要的问题,被广泛地应用于多个其他领域.它对提高众多实际应用的性能有着重要意义.随着目前参加kNN连接的数据集的增大和要求的响应时间的缩短(尤其在一些应急环境中),作者实际上对kNN连接的效率要求更高.然而,目前的方法大多基于单个进程或者单台机器,并不具有很好的伸缩性.为了解决这个问题,作者引入了map-reduce框架来运行kNN join并提出了两种新的方法:基于map-reduce的分布式网格概略化kNN join(DSGMP-J)和基于map-reduce的voronoi diagram下kNN join(VDMP-J).并把它们和最新的方法 H-BNLJ进行了实验对比.实验结果证明了作者提出的DSGMP-J和VDMP-J方法具有较优的伸缩性.
kNN Join is a basic and important operation which is widely used in many fields.Hence,it plays a significant role in improving the efficiency of the applications in those fields.Nowadays,with the rapid increase of data size and the requirement for shorter response time(especially in some emergency environments),people actually ask for a more efficient way to conduct kNN Join.However,conventional kNN join operation is mostly running on single computer and/or single process at present,which cannot provide enough scalability.To address this problem,we incorporate the map-reduce framework into the running of kNN join and propose two novel methods:distributed sketched grid based kNN Join using map-reduce(DSGMP-J)and voronoi diagram based kNN Join using map-reduce(VDMP-J).And compare them with a stateof-the-art method:hadoop block nested loop join(H-BNLJ).The experiment results prove that the DSGMP-J and the VDMP-J outperform the H-BNLJ in scalability.