针对RoboCup这一典型的多目标强化学习问题,提出一种基于最大集合期望损失的多目标强化学习算法LRGM-Sarsa(λ)算法.该算法预估各个目标的最大集合期望损失,在平衡各个目标的前提下选择最佳联合动作以产生最优联合策略.在单个目标训练的过程中,采用基于改进MSBR误差函数的Sarsa(λ)算法,并对动作选择概率函数和步长参数进行优化,解决了强化学习在使用非线性函数泛化时,算法不稳定、不收敛的问题.将该算法应用到RoboCup射门局部策略训练中,取得了较好的效果,表明该学习算法的有效性.
For solving the multiple-goal problem in RoboCup,a novel multiple-goal Reinforcement Learning algorithm,named LRGM-Sarsa(λ),is proposed.The algorithm estimates the lost reward of the greatest mass of every sub goal and trades off the long term reward of the sub goals to get a composite policy.In the single learning module,B error function,which is based on MSBR error function is proposed.B error function has guaranteed the convergence of the value prediction with the non-linear function approximation.The probability funciton of selecting actions and the parameter α are also improved with respect to B error function.This algorithm is applied to the training of shooting in Robocup 2D.The experimental results show that the proposed algorithm is more stable and converges faster.