近年来,随着Web 2.0和具有位置感知能力的移动计算设备的普及应用,带来了大量含有时空语义的地理大数据。在这个背景下,以地图厂商人工方式和半自动方式更新地名地址库为基础的传统地理编码服务,已难以满足新的应用需求。本文提出一种地理大数据驱动的自适应地理编码引擎的构建思路和方法,通过引入实时计算和流式计算平台Storm,实现对网络中的多源地理大数据的爬取与实时处理,加速地名地址库及相关资源的生成与更新过程,并给出了相适应的地理编码匹配方法。在实时流式计算框架基础上,通过JTS Topology Suite实现流式并行的空间操作,设计并实现了基于Storm的地理编码引擎原型系统,满足多源地理大数据的高效处理和地理编码要求。实验结果表明,该引擎通过实时流式处理可加速地址库的扩充与更新过程,并且利用地址库持续更新的方法,提升了地理编码的匹配率和定位准确度。
The explosion in geographical data with spatio-temporal characteristics has led a surge in the demand of adaptive geocoding engine construction driven by Big Geo-Data, when Web 2.0 techniques popularize and mobile devices that are capable of location-awareness become prevalent. The traditional geocoding service, which maintains gazetteers manually or semi-automatically by authoritative mapping agencies, cannot satisfy the needs of the latest researches. In order to solve the problems related to efficient storage and manipulation of massive Geo-Data in GIScience and related fields, our research proposes a method to build the adaptive geocoding engine in a geo-data-driven approach using Storm, a real-time and stream computing platform, thus to process multi-source network spatio-temporal data in real-time and accelerate the progression of building and maintaining gazetteers. Based on these data, an adaptive matching method of geocoding is built on the next stage. A prototype system of geocoding engine based on Storm is designed and implemented, which can process and geocode the multiple-source Geo-Data effectively. Experiments that were conducted on the POI datasets from Baidu reveals a high matching rate, which is more than 98%, and a accuracy rate of above 95%, while the average corresponding time per geocoding is about 75ms, which is practically applicable. The cases certify that real-time Storm-based streaming spatial operations not only consume an order of magnitude less time than traditional desktop stand-alone operations, but also enhance the matching rate and improve the positioning precision, which implies that the proposed solution is both feasible and practically effective. Our work offers new insights on collecting and processing POI datasets, enriching and building gazetteers, improving geocoding results in real-time with the use of Storm clusters. It makes contributions to apply real-time streaming computation methods to GIS for the state-of-the-art of Geo-Data computing, analytics and mining.