交通流数据具有多来源、高速率、体量大等特征,传统数据存储方法和系统暴露出扩展性弱和存储实时性低等问题。针对上述问题,设计并实现了一套基于HBase交通流数据实时存储系统。该系统采用分布式存储架构,通过前端的预处理操作对数据进行规范化整理,利用多源缓冲区结构对不同类型的流数据进行队列划分,并结合一致性哈希算法、多线程技术、行键优化设计等策略将数据并行存储到HBase集群服务器中。实验结果表明:该系统与基于Oracle的实时存储系统相比,其存储性能提升了3~5倍;与原生的HBase方法相比,其存储性能提升了2~3倍,并且具有良好的扩展性能。
Traffic stream data has characteristics of multi-source, high speed and large volume, etc. When dealing with these data, the traditional methods and systems of data storage have exposed the problems of weak scalability and low real-time storage. To address these problems, this work designed and implemented a HBase-based real-time storage system for traffic streaming data. The system adopted the distributed storage architecture, standardized data through front-end preprocessing,divided different kinds of streaming data into different queues by using multi-source cache structure, and combined the consistent Hash algorithm, multi-thread and row-key optimization strategy to write data into HBase cluster in parallel. The experimental results demonstrate that, compared with the real-time storage system based on Oracle, the storage performance of the system has 3- 5 times increment. When compared with the original HBase, it has 2- 3 times increment of storage performance and it also has good scalability.