对近年来在电力系统中出现的大规模数据流进行了探讨,目的是利用流式计算技术提高系统的实时性和安全性。针对大规模用电信息采集中用电数据流的快速聚类和异常检测技术展开研究。结合分布式流式计算平台Spark Streaming,基于用电行为在纵向时间和横向空间上表现出的聚类特性,即同类用户具有相似用电模式和同一用户历史数据具有相似性,设计并实现了流式DBSCAN聚类算法,以实现对大规模用电数据流的快速异常检测。设计并搭建了支持大规模数据流处理的实验环境,证明了算法的有效性。
With the large-scale data stream recently emerging in power systems,utilizing stream computing technology to improve power system real-time and safety has become a critical requirement.For the large-scale data stream of power consumption information collection system,fast clustering technology and anomaly detection technology for streaming data is studied.With reference to the distributed stream computing platform Spark Streaming,Streaming DBSCAN algorithm is designed and implemented by taking advantage of longitudinal time and transverse space clustering features exhibited in the electricity consumption behavior,which means that the same cluster of users have similar power consuming pattern,and one user has similar historical power consuming data.The streaming DBSCAN algorithm is able to achieve fast anomaly detection of a large-scale power data stream.The experimental environment in support of large-scale data stream processing is set up,which can support and validate the effectiveness of the algorithm.