东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于平均报酬模型全过程R(λ)学习的互联电网CPS最优控制

期刊名称：电力系统自动化, 34(21), pp 27-33, 2010/11/10.
时间：0
分类：TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]华南理工大学电力学院,广东省广州市510640
相关基金：国家自然科学基金资助项目（50807016）; 广东省自然科学基金资助项目（9151064101000049）; 中央高校基本科研业务费专项资金资助项目（2009ZM0251）
相关项目：CPS标准下AGC的最优松驰控制及其马尔可夫决策过程

关键词：控制性能标准(CPS), 自动发电控制(AGC), 平均报酬模型, R(λ)学习, 模仿学习, control performance standard（CPS）, automatic generation control（AGC）, average reward model, R（λ）-learning, imitation-learning

中文摘要：

提出了一种新颖的基于平均报酬模型的全过程R（λ）学习互联电力系统CPS最优控制方法。该方法与电网自动发电控制（AGC）追求较高的考核时间段内的10min平均控制性能标准（CPS）指标合格率的目标相吻合,且所提出的基于平均报酬模型的R（λ）学习算法与基于折扣报酬模型的Q（λ）学习算法相比,在线学习收敛速度更快,可获得更佳的CPS指标。此外,所提出的改进的R（λ）控制器具有全过程在线学习的特点,其预学习过程被一种新型的在线＂模仿学习＂所代替,克服了以往强化学习控制需要另外搭建仿真模型来进行预学习收敛的严重缺陷,提高了R（λ）控制器的学习效率及其在实际电力系统中的应用性。

英文摘要：

The R（λ）-learning algorithm is based on the average reward model.A novel optimal CPS control methodology for interconnected power systems based on the whole process R（λ）-learning algorithm is presented.The objective of the presented CPS control methodology coincides with that of AGC which pursues the high CPS compliance in every ten minutes.Moreover, the R（λ）-learning algorithm can converge faster and gain higher value of the CPS index than the Q（λ）-learning algorithm which is based on a discounted reward model.In addition,the improved controller based on the novel R（λ）-learning algorithm holds the advantage of learning on-line in the whole process and the pre-learning process of the controller is substituted by the imitation-learning process.The improved controller overcomes the serious defect of the conventional reinforcement learning controller which needs to build an accurate simulating model for converging in the pre-learning process,and it can enhance the learning efficiency and applicability in power systems.

同期刊论文项目