为应对用电大数据的处理要求,基于Hadoop集群构建了面向用电信息采集系统大数据的处理架构和计算服务架构,对计算任务进行并行化处理,提高了用电大数据的计算效率;同时针对用电信息采集系统中大部分计算服务的计算规则复杂、存在数据依赖关系等特点,将同一计算服务内的Map Reduce作业及及其依赖关系以有向无环图的形式进行组织,并利用依赖控制引擎进行作业运行状态的自动管理,实现作业提交时间的动态调整,达到优化作业执行效率和成功率的效果。实验结果证明,所提方法高效可行,计算服务性能高,能够适应用电大数据的实时处理要求。
In response to processing requirements of large amounts of power data, processing and computing architectures are constructed on Hadoop for electricity big data. It uses parallel processing for computation tasks to improve calculation efficiency of power big data. And for complex calculation rules, data dependency, and other characteristics of computing services in electric energy data acquire system, Map Reduce jobs and their dependencies in the same computation service are organized in directed acyclic graph form. Using dependency control engine, it performs automatic management of job operation state, and realizes dynamic adjustment of job submission time, to achieve optimization of job execution efficiency and success rate. Experimental results show that the proposed method is feasible and adaptive to real-time processing requirements of power big data with high computation service performance.