随着卫星定位技术和移动互联网技术的飞速发展,地理空间数据来源变得更加多源异构.面对海量地理空间数据,如何快速有效地找到目标周围的兴趣点变得异常重要.依据空间k近邻(kNN)查询算法,提高效率的关键在数据索引和数据块存储结构设计,通过引入云计算的MapReduce编程模型,设计了一种面向MapReduce的地理空间数据双层倒排网格索引,利用CircularTrip算法实现了目标点近邻查询计算,最终获得距离目标点最邻近的数据点集.实验结果表明,该索引方法较单层倒排网格索引下的kNN查询效率有明显提高,且数据量越大效率提升越明显,此法适合大规模并行计算.
With the development of satellite positioning and mobile internet technology,the geospatial data become more multi-source and heterogeneous,which consequently makes the obtainment of the interesting points around a target remarkably important.Considering that the key to the efficiency of the kNN algorithm lies in the design of data index and the storage structure of data block,we propose a MapReduce-oriented double inverted grid index for geospatial data.The targets neighbor query calculation is implemented based on the CircularTrip algorithm,and finally the nearest point sets are achieved according to the requirements.The results of the following experiments show that the indexing method not only provides a significant improvement in kNN query efficiency,but also has a good performance under a great amount of data,which consequently fits large-scale parallel computing better.