东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于偏向信息学习的双层强化学习算法

期刊名称：计算机研究与发展，45（9）：1455-1462， 2008.
时间：0
分类：TP18[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]中国科学院计算技术研究所智能信息处理重点实验室,北京100190, [2]中国科学院研究生院,北京100049, [3]北京邮电大学北京市智能软件与多媒体重点实验室,北京100876
相关基金：国家“八六三”高技术研究发展计划基金项目（2007AA012132）;国家“九七三”重点基础研究发展规划基金项目（2003CB317004）;国家自然科学基金项目（60775035,90604017） This work is supported by the 863 National High-Tech Program （No. 2007AA01Z132）, the National Basic Research Priorities Programme （No. 2003CB317004）, and the National Natural Science Foundation of China （No. 60775035, 90604017）. Reinforcement learning has received much attention in the past decade. Its incremental nature and adaptive capabilities make it suitable for use in various domains, such as automatic control, mobile robotics and multi-agent system. A critical problem in conventional reinforcement learning is the slow convergence of the learning process. However, in most learning systems there usually exists priori knowledge in the form of human expertise or previously learned experience. Thus we propose a dual reinforcement learning model based on bias learning which integrates reinforcement learning process and bias learning process. The dual model makes effective use of the priori knowledge, and eliminates negative effects of incorrect priori knowledge. We believe that the model will greatly advance the use of reinforcement learning in reality, especially in complex and dynamic environment.
相关项目：基于感知学习和语言认知的智能计算模型

关键词：强化学习, Q-学习算法, 偏向信息, 偏向信息学习, 先验知识, reinforcement learning, Q-learning, bias, bias learning, priori knowledge

中文摘要：

传统的强化学习存在收敛速度慢等问题，结合先验知识预置某些偏向可以加快学习速度．但是当先验知识不正确时又可能导致学习过程不收敛．对此，提出基于偏向信息学习的双层强化学习模型．该模型将强化学习过程和偏向信息学习过程结合起来：偏向信息指导强化学习的行为选择策略，同时强化学习指导偏向信息学习过程．该方法在有效利用先验知识的同时能够消除不正确先验知识的影响．针对迷宫问题的实验表明，该方法能够稳定收敛到最优策略；并且能够有效利用先验知识提高学习效率，加快学习过程的收敛．

英文摘要：

Reinforcement learning has received much attention in the past decade. Its incremental nature and adaptive capabilities make it suitable for use in various domains, such as automatic control, mobile robotics and multi-agent system. A critical problem in conventional reinforcement learning is the slow convergence of the learning process. To accelerate the learning speed, bias information is incorporated to boost learning process with priori knowledge. Current methods use bias information for the action selection strategies in reinforcement learning. They may suffer from the nonconvergence problem when priori knowledge is incorrect. A dual reinforcement learning model based on bias learning is proposed, which integrates reinforcement learning process and bias learning process. Bias information is used for action selection strategies in reinforcement learning and reinforcement learning is used to guide bias learning process. Thus the dual reinforcement learning model could make effective use of priori knowledge, and eliminate the negative effects of incorrect priori knowledge. Finally, the proposed dual model is validated by experiment on maze problem including simple environment and complex environment. The experimental results demonstrate that the model could converge to the optimal strategy steadily. Moreover, the model could improve the learning performance and speed up the convergence of the learning process.

同期刊论文项目