针对Q学习状态空间非常大,导致收敛速度非常慢的问题,利用智能体在不同样本上分类性能不同,提出了基于样本的学习误差对样本空间进行划分,充分发掘了样本和智能体的匹配关系.以带障碍物的格子世界作为仿真环境,表明该算法提高了在线学习性能.
To counter for the problem of slowly convergence of Q leaning when comeing to large state-space, the paper puts forward an algorithm which divide the states space according to learning errors. The basic idea of our algorithm is to discover the matching relationship between agents and the sub-space of states space. The simulations in grids with blocks indicate that the algorithm performs better when comeing to on-line learning.