由现代强化学习和深度学习相结合形成的深度强化学习方法是目前人工智能领域一个新的研究热点,已经在各种需要感知高维度原始输入数据和决策控制的任务中取得了实质性的突破.尤其是一种被称为深度Q网络的模型在处理诸如Atari 2600游戏这类趋于真实环境的复杂问题时表现出了和人类玩家相媲美的水平.然而,当存在有延迟的奖赏而导致需要长时间步规划才能优化策略的情形中,深度Q网络的表现就会急剧下降.这说明深度Q网络并不擅长解决战略性深度强化学习任务.针对此问题,文中使用带视觉注意力机制的循环神经网络改进了传统的深度Q网络模型,提出了一种较为完善的深度强化学习模型.新模型的关键思想有两点:一是使用双层门限循环单元构成的循环神经网络模块来记忆较长时间步内的历史信息.这使得Agent能够及时使用有延迟的反馈奖赏来正确地指导下一步的动作选择;二是通过视觉注意力机制自适应地将注意力集中于面积较小但更具价值的图像区域,从而使得Agent能够更加高效地学习近似最优策略.该文通过选取一些经典的Atari 2600战略性游戏作为实验对象来评估新模型的有效性.实验结果表明,与传统的深度强化学习模型相比,新模型在一些战略性任务上具有很好的性能表现和较高的稳定性.
Reinforcement Learning, as a subject of study for over more than fifty years, investigates how an autonomous agent can learn what to do to maximize a numerical reward signal from interaction with the world by balancing exploration of the environment with exploitation of knowledge gained via evaluative feedback, without relying on exemplary supervision of an omniscient teacher or complete models of the environment. Deep learning is a cutting-edge approach to machine learning that concerns with using multi-layer artificial neural networks to learn the complicated representations that are expressed in terms of simpler ones. Currently, Deep Reinforcement Learning formed by combining modern reinforcement learning with deep learning is becoming a new research hotspot in the Artificial Intelligence community, and has made substantial breakthroughs tn a variety ot tasks--such as robot control, text recognition and games--requiring both rich perception of high- dimensional raw inputs and policy 'control. In particular, a state-of-the-art deep reinforcement learning model, termed Deep Q-Network, is able to perform human-level control using the same network architecture and hyper-parameters for handling problems approaching real-world complexity such as some Atari 2600 games. However, Deep Q-Network's performance falls far below human level in situations that exist delayed rewards and require planning under uncertainty within long-time horizon to optimize strategies. This implies that Deep Q-Network is not good at controlling agents in strategic deep reinforcement learning tasks. To alleviate the issue, this paper proposes a novel deep reinforcement learning model by improving Deep Q-Network with recurrent neural networks based on visual attention mechanism. Two key ideas are included in the new model: (1) it uses recurrent neural networks consisting of two-layer gated recurrent units in order to remember more historical information of multiple time steps. This can make an agent exploit delayed feedback in tim