对于有吸收目标状态的循环任务,比较合理的方法是采用基于平均报酬模型的强化学习.平均报酬模型强化学习具有收敛速度快、鲁棒性强等优点.本文介绍了平均报酬模型强化学习的3个主要算法:R学习、H学习和LC学习,并给出了平均报酬模型强化学习的主要应用及研究方向.
It is rational to adopt the average reward reinforcement learning algorithms for solving the absorbing goal states cyclical tasks: It has the merit of converging quickly and robustly. A detailed study as regards average reward reinforcement learning including R-learning, H-learning and LC-learning is presented and the application and future research are proposed.