东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于高斯过程分类器的连续空间强化学习

期刊名称：电子学报
时间：0
页码：1153-1158
语言：中文
分类：TP18[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]中国矿业大学信息与电气工程学院,江苏徐州221116, [2]中国科学院自动化研究所,北京100190
相关基金：教育部新世纪优秀人才支持计划（No.NCET-08-0836）;国家自然科学基金（No.60804022）;江苏省自然科学基金（No.BK2008126）;高等学校博士学科点专项科研基金（No.20070290537,200802901506）;国家博士后科学基金（No.20070411064）
相关项目：基于支持向量机的复杂连续系统强化学习控制研究

关键词：高斯过程, 分类器, 连续空间, 强化学习, 小船靠岸问题, Gaussian process, classifier, continuous space, reinforcement learning, boat problem

中文摘要：

如何将强化学习方法推广到大规模或连续空间，是决定强化学习方法能否得到广泛应用的关键．不同于已有的值函数逼近法，把强化学习构建为一个简单的二分类问题，利用分类算法来得到强化学习中的策略，提出一种基于高斯过程分类器的连续状态和连续动作空间强化学习方法．首先将连续动作空间离散化为确定数目的离散动作，然后利用高斯分类器对系统的连续状态一离散动作对进行正负分类，对判定为正类的离散动作按其概率值进行加权求和，进而得到实际作用于系统的连续动作．小船靠岸问题的仿真结果表明所提方法能够有效解决强化学习的连续空间表示问题．

英文摘要：

The generalization of reinforcement learning methods to large-scale or continuous spaces has become a major focus in the research field of reinforcement learning. Unlike the present reinforcement learning methods for continuous spaces based on a value-function approximation method,the reinforcement learning is constructed as a simple binary-class problem.A kind of rein- forcement learning method for continuous state and action spaces based on a Gaussian process classifier is proposed using a classification algorithm to obtain a conrail policy.At first,a continuous action space is discretized into discrete actions with definite number, and the Gaussian process classifier is used to predict the probability of class for a continuous-state-discrete-action pair. Then a continuous action is generated based on a weighted operation of the positive actions with their probability values. Computer simulations involving a boat problem illustrate the validity of the proposed reinforcement learning method.

同期刊论文项目