提出了一种基于Hadoop云平台的时空数据分布式存储方法,以应对空间应用中出现的无法满足高并发用户在线实时访问和空间信息服务中断等大数据存储瓶颈问题。该方法运用时空数据切分与布局机制使数据均匀分布于集群中以确保存储与访问负载均衡;运用时空对象重组织机制提高数据的时空临近性以匹配时空应用存取模式;运用热点时空对象分布式缓存机制以降低磁盘I/O访问延迟。利用该方法实现了基于Hadoop云平台的时空数据分布式存储中间件原型系统exHDFS,实验结果表明该方法能高效地满足数据密集型空间应用存储需求。
A novel scheme for distributed spatio-temporal data storage based on the Hadoop cloud platform is proposed to solve the storage bottleneck problems of current main spatio-temporal data storage methods such as low efficiency in online geospatial applications and service interruption. The scheme has three significant mechanisms. Firstly, an efficient data partitioning and placement approach is introduced to distribute big data across cluster nodes evenly and to guarantee load balancing. Secondly, a spatio-temporal data object reorganization approach is adopted to improve the geographic proximity and to meet the access patterns of geospatial applications. Thirdly, a distributed hotspot object caching approach for frequently accessed spatio-temporal data is used to reduce disk I/O access latency. The exHDFS, an intermidiate prototype system based on the Hadoop cloud platform for distrilouted spatio-temporal data storage ,was designed and implemented by using the new scheme. The results of the comprehensive experiments show that the extHDFS outperforms the comparisons and thus it could meet the storage requirements of data-intensive geospatial applications.