随着移动定位技术的发展以及智能手机的普及,互联网中空间文本对象的数量正在急速增长,如何在规模庞大且动态增长的空间文本对象中进行高效的空间关键字查询成为了许多空间关键字查询应用所关心的问题.现有的方法通常利用基于R树和倒排索引的混合索引结构来处理空间关键字查询,然而,面对数量巨大而且不断增长的空间文本对象,这些方法往往难以为空间关键字查询的高效性和扩展性提供支持.对此,提出一种基于HBase的空间文本数据索引结构SK-HBase.SK-HBase以HBase作为数据存储,通过有效的数据分配策略对空间文本对象的空间信息和文本信息同时进行索引.在SK-HBase的基础上,本文提出了两种空间关键字查询算法,以保证不同空间范围下的空间关键字查询的高效性和可扩展性.实验证明,我们的方法能够在海量数据下进行高效的空间关键字查询并具有良好的可扩展性.
With the development of mobile positioning and the popularity of mobile phones,spatio-textual objects in the Internet increases rapidly.Thus,how to process the spatial keyword query under the massive sptaio-textual objects that are still increasing efficiently becomes a big problem for the spatial keyword query applications.Most of the existing approaches used hybrid indexes,which always combined the R-tree with the inverted files together,to answer the spatial keyword queries.However,when confronted with the massive and increasing spatio-textual objects,these approaches couldn't support the efficiency and scalability of spatial keyword query well.In this paper,we propose a novel HBase-based index structure for spatio-textual objects,named SK-HBase.SK-HBase uses HBase for data storage and indexes the textual and spatial information of objects at the same time through effective data strategy.On the basis of SK-HBase,we propose two kinds of spatial keyword query algorithms to ensure the efficiency and scalability of spatial keyword query for different query scopes.We show through extensive experiments that out approach can achieve good efficiency and scalability when dealing with spatial keyword query under large scale of spatio-textual objects.