针对矩形空间数据对象,以传统CIF四叉树索引技术为基础,利用Hadoop平台与Map Reduce并行编程模型,采用“分而治之”的思想,对数据空间进行划分,设计适用于分布式环境的创建索引、相交查询、区域删除的并行算法。在此基础上,通过改变数据集中矩形对象的数目与map数进行实验,分析并行创建与相交查询的效率。实验结果表明,对于大数据量的数据集与多数据集,并行创建与查询可以提高处理效率。
We design some algorithms about parallel index creation, intersection query and regional remove for the rectangle ob- jects, which are suitable for the distributed environment. These algorithms rely on the methods of dividing the data space, as well as the idea of divide-and-conquer. And they are based on the CIF indexing techniques supported by the Hadoop platform and the MapReduce programming model. On this basis, we test the parallel index creation and intersection queries's efficiency by chan- ging the size of data sets of rectangle objects and the number of the map tasks. The experiments results show that using parallel al- gorithms of the parallel index creation and intersection queries can improve the processing efficiency for large data sets.