处理连续状态强化学习问题,主要方法有两类;参数化的函数逼近和自适应离散划分.在分析了现有对连续状态空间进行自适应划分方法的优缺点的基础上,提出了一种基于节点生长k-均值聚类算法的划分方法,分别给出了在离散动作和连续动作两种情况下该强化学习方法的算法步骤.在离散动作的Mountain-Car问题和连续动作的双积分问题上进行仿真实验.实验结果表明,该方法能够根据状态在连续空间的分布,自动调整划分的精度,实现对于连续状态空间的自适应划分,并学习到最佳策略.
State variables of real-world problems are usually continuously real-valued variables. However, a standard reinforcement learning method is only suitable for problems with finite discrete states. To apply it to real-world problems, representation of continuous states must be properly handled. There are mainly two kinds of methods. One is parameterized function approximation method and the other is discretization method. To analyze the advantages and disadvantages of the current adaptive partition method, a partition method based on node-growing k-means clustering is proposed. Reinforcement learning methods based on the proposed clustering algorithm are presented for both discrete and continuous action problems. Simulation is conducted on mountain-car problem with discrete actions and on double integrator problem with continuous actions. Results show that the proposed method can adaptively adjust partition resolution and achieve an adaptive partition of continuous state space. Optimal policy is learned at the same time.