针对电力系统监测中大量时间序列数据存储和高效查询的问题,利用云计算框架和HQL(Hive SQL)查询引擎,提出一种容错存储的分级分区查询优化方法。通过副本机制设计电力监测数据容错存储模式,综合运用了HQL查询计划生成、向Map/Reduce的转化和分区剪枝处理,进行了加载和查询优化测试。结果表明,当加载的监测数据记录超过200万条、查询数据记录超过380万条后,HQL的处理性能将远超SQL,数据量越大,优势越明显。分级分区查询测试结果表明,在查询耗时相近的条件下,分区查询的数据量可以扩大2个数量级,且二级分区比一级分区更高效,验证了查询优化技术可有效提高电力系统监测信息查询处理的效率,为大量电力监测数据处理提供了一种查询优化方法。
In order to solve massive time series data storage and query problem in power system monitoring, a new classified partition query optimization method of fault tolerant storage is proposed using cloud computing framework and HQL query engine. Fault tolerant storage model of power system monitoring data is designed through redundancy replication mechanism, and loading test and query optimization test of monitoring data are carried out by coordinating HQL query plan generation, conversion from HQL to Map/Reduce and partition pruning process. Results show that HQL performance is better than SQL when amounts of loading data exceed two millions or amounts of query data exceed three hundred and eighty thousand, and the bigger the data volume is, the more obvious the performance advantage will be. Classified partition query results show that amounts of data for partition query can be expanded to two orders of magnitude under similar time conditions, and second-level partition is better than first-level partition. It is verified that query optimization can improve query effectiveness, and provide a query optimization method for power monitoring data processing.