东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于节点生长k-均值聚类算法的强化学习方法

期刊名称：计算机研究与发展，2006,34(4):661-666
时间：0
分类：TP18[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]中国科学技术大学自动化系,合肥230027
相关基金：国家自然科学基金项目（60575033） Researches on intelligent control systems, which focus on mimicking human intelligence and embedding intelligent abilities into control systems, are expected to make up for shortcomings of the current control theory and to bring a breakthrough in applications of control theory to solve complex industrial process control problems. Among all the intelligent abilities, learning is not only a major approach to acquire knowledge and to elevate intelligence level, but also a notable indication of human intelligence. Learning ability is also a major characteristic that separates intelligent control systems from conventional control systems. Among all the branches of intelligent control research, learning control research is an important one. Learning control systems progressively improves their control performance based on interactions with plants and their previous experience. Usually, learning control is suitable for solving uncertainty problems caused by plant nonlinearity and imperfect system modeling and alleviating difficulties caused by lacking of necessary a prior knowledge. State variables of real-world problems are usually continuously real-valued variables. However, standard reinforcement learning method is only suitable for problems with finite discrete states. To apply it to real-world problems, representation of continuous states must be properly handled.
相关项目：灰色定性仿真理论、算法及典型应用的研究

关键词：强化学习, K-均值聚类算法, Sarsa学习, 连续状态表示, reinforcement learning, k-means clustering algorithm, Sarsa learning, continuous state representation

中文摘要：

处理连续状态强化学习问题，主要方法有两类；参数化的函数逼近和自适应离散划分．在分析了现有对连续状态空间进行自适应划分方法的优缺点的基础上，提出了一种基于节点生长k-均值聚类算法的划分方法，分别给出了在离散动作和连续动作两种情况下该强化学习方法的算法步骤．在离散动作的Mountain-Car问题和连续动作的双积分问题上进行仿真实验．实验结果表明，该方法能够根据状态在连续空间的分布，自动调整划分的精度，实现对于连续状态空间的自适应划分，并学习到最佳策略．

英文摘要：

State variables of real-world problems are usually continuously real-valued variables. However, a standard reinforcement learning method is only suitable for problems with finite discrete states. To apply it to real-world problems, representation of continuous states must be properly handled. There are mainly two kinds of methods. One is parameterized function approximation method and the other is discretization method. To analyze the advantages and disadvantages of the current adaptive partition method, a partition method based on node-growing k-means clustering is proposed. Reinforcement learning methods based on the proposed clustering algorithm are presented for both discrete and continuous action problems. Simulation is conducted on mountain-car problem with discrete actions and on double integrator problem with continuous actions. Results show that the proposed method can adaptively adjust partition resolution and achieve an adaptive partition of continuous state space. Optimal policy is learned at the same time.

同期刊论文项目