AGC是一个动态多级决策问题一一马尔可夫决策过程(MDP),应用强化学习算法可有效地实现控制策略的在线学习和动态优化决策。引入Q学习算法作为强化学习核心算法,将CPS值看作包含AGC的电力系统“环境”所给的“奖励”,依靠奖励值Q函数与CPS控制动作形成的闭环控制结构实现在线学习。学习目标是使CPS控制动作从环境获得的长期积累奖励值最大,从而快速自动地在线优化CPS控制系统的输出。仿真研究显示,引入强化学习自校正控制后显著增强了整个AGc系统的鲁棒性和适应性,有效提高了CPS考核合格率。
The automatic generation control (AGC) problem is a stochastic multistage decision problem, which can be modeled as a Markovian Decision Process (MDP). The paper introduces the Q-learning method as the core algorithm of reinforcement learning (RL), and regards the CPS values as the rewards from the interconnected power systems. By regulating a closed-loop CPS control rule to maximize the total reward in the procedure of on-line learning, the optimal CPS control strategy can be gradually obtained. The case study shows that after adding the RL control, the robustness and adaptability of AGC system is enhanced obviously and the CPS compliance is ensured. This work is supported by National Natural Science Foundation of China(No.50807016) and Natural Science Funds of Guangdong Province (No. 06300091).