东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于POMDP模型的机器人行动的仿真优化

期刊名称：系统仿真学报, 20(21): 5903-5906, 2008
时间：0
分类：O232[理学—运筹学与控制论;理学—数学] TP391.9[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国科学技术大学网络传播系统与控制联合实验室,合肥230027
相关基金：国家自然科学基金（60574065）;国家863计划（2006AA01Z114）;中国科学院自动化所和中国科学技术大学智能科学与技术联合实验室种子基金.
相关项目：隐Markov过程的性能灵敏度分析与优化

关键词： POMDP, 强化学习, 策略梯度算法, 仿真优化, POMDP, reinforcement learning, policy-gradient algorithm, simulation optimization

中文摘要：

策略梯度优化算法是一种很重要的强化学习算法，对实现机器人的自主导航有着重要的应用价值。在部分可观Markov决策过程（POMDP）的基础上，实现了两个有限记忆的策略梯度优化算法：基于模型的GAMP算法和无模型的IState-GPOMDP算法，并利用该算法对机器人走迷宫的问题进行了仿真。通过分析仿真结果，对这两种算法引入了基于观测的优化；并发现在所给报酬函数下，策略梯度算法中的步长参数也在一定程度上影响着优化策略的效率。

英文摘要：

Policy-gradient algorithm is a very important way of reinforcement learning algorithm, which is of significant value to a robot＇s navigation by itself. On the basis of partially observable Markov decision processes, two finite-memory policy-gradient algorithms, that is, model-based GAMP algorithm and model-free IState-GPOMDP algorithm, were implemented, and employed in the simulation of a robot walking in a maze. According to the analysis of experimental results, GAMP algorithm and IState-GPOMDP algorithm were optimized based on observation. And it is found that the step, the parameter in Policy-gradient algorithm, has effect, to some extent, on the efficiency of optimization of the robot＇s action policy under certain rewarding function circumstance.

同期刊论文项目