随着智能电网、通信网络技术和传感器技术的发展,电力用户侧数据呈指数级增长、复杂程度增大,逐步构成了用户侧大数据。传统的数据分析模式已无法满足需求,迫切需要解决电力用户侧的大数据在分析与处理方面的难题。该文分析电力用户大数据的来源,针对电力用户侧大数据的数据量大、种类繁多与速度快等特点,指出电力用户侧的大数据在数据存储、可用性、处理等方面面临的挑战。结合云计算技术提出一种电力用户侧大数据分析处理平台,将智能电表、SCADA系统和各种传感器中采集的数据整合,并利用并行化计算模型Map Reduce与内存并行化计算框架Spark对电力用户侧的大数据进行分析。提出基于随机森林算法的并行负荷预测方法,将随机森林算法进行并行化,对历史负荷、温度、风速等数据进行并行化分析,缩短负荷预测时间和提高随机森林算法对大数据的处理能力。设计并实现基于Hadoop的电力用户侧大数据并行负荷预测原型系统,包括数据集群的管理、数据管理、预测分类算法库等功能。采用不同大小的数据集对并行化随机森林算法进行负荷预测实验,实验结果表明,并行化随机森林算法的预测精度明显高于决策树的预测精度,且在不同数据集上预测精度普遍高于决策树的预测精度,能够较好的对大数据进行分析处理。
With the development of smart grids, communication network and sensor technology, the electric power user side data is growing exponentially, more complexi, and gradually forms the big data of electric power user side. Now the traditional data analysis model can't meet the demand of big data, so a new data analysis model aiming at analyzing and processing big data of power user side is urgently necessary. The source of the big data of electric power user side is analyzed in this paper. Those challenges facing data storage, availability, processing of the power user side are pointed out based on volume, variety and speed and other characteristics of the big data. Combining cloud computing technology, an analysis and processing platform of big data of electric power user side is given, which integrates smart meter data, SCADA systems data and various sensors data to be processed by Map Reduce or Spark. A load forecasting method based on parallel random forests algorithm is proposed. Parallelization random forest algorithm is used to analyze data, such as load data, temperature, wind speed. The method shortens the time of load forecasting and improves random forests algorithm on data processing capability. Parallel load forecasting prototype system of electric power users side big data based on Hadoop is designed and implemented, including cluster management, data management, predictive classification algorithms library functions and so on. By using data sets of different sizes to do load forecasting experiment with parallelization random forest algorithm, the experiment results show that the prediction accuracy of the parallel random forest algorithm is significant higher than that of the decision tree. The prediction accuracy of different data sets is generally higher than the forecast accuracy of the decision tree, and applying the parallel random forestalgorithm to analyze and processing big data is a better choice.