为解决传统的关系型数据库无法有效存储及查询海量空气质量监测数据的问题,设计了一个基于HBase的大数据存储模式。该模式以带时间戳的多版本形式存储空气质量监测物的实时数据,使用实时数据计算站点、城区的小时,日均值数据及评价后,将结果分别存储到对应的数据列中。通过设计包含地区编码、站点编码、时间信息维度的行健,将数据合理的分布在多个子区域及其包含的存储文件中,以便有效进行数据查询。实验结果表明该模式可以完全满足实时数据、小时均值数据及评价、日均值数据及评价的存储及业务逻辑的数据查询要求,从而证明了该存储模式的可行性。
The traditional relation database can't store and query massive air quality monitoring data efficiently, so a big data store schema based on HBase is designed. Multi-versions air quality monitoring real time data with timestamp is stored in RTData column of the schema, the hour average data, daily average value of air monitoring subjects and corresponding evaluation are stored by the result of real time data computing. The real time data, hour data and daily data can be reasonably distributed in the HRegions and store files by row key with area code, station code and data-time dimensions, so as to query data efficiently. The results of the experiment show that the storage requirement of real data, hour average data and evaluation data, daily average data and evaluation data and the business logic queries can be fully satisfied by the schema design, so the feasibility of the schema is proved.