针对多机器人系统在未知环境编队导航过程中遇到较长障碍物时,顺时针绕障和逆时针绕障的不同选择会给导航效率带来很大影响的问题,提出了一种三层强化学习方法。由高层的基于“条件-行为对”的在线学习适应环境障碍物的动态变化,中层采用角色交叉包含式控制结构保持队形,底层采用离线式常规强化学习机制获得避碰规则。仿真实验结果表明,由于只在高层保持在线学习,使学习空间得以缩小,学习时间得以缩短。该方法为复杂环境下的多机器人编队导航提供了一种有效的自主学习策略。
When multi-robot formation encounters long obstacles in unknown environment, the choice of clock-wise circumambulating or counter clock-wise circumambulating will greatly affect the efficiency of navigation. A kind of reinforcement learning with three levels is presented to solve this problem. The high level is based on be station-behavior pair to learn the circumambulating direction according to the dynamic variational obstacles. The middle level uses a Role-Cross-Subsumption control framework to keep the formation of the multi-robots. The lower level uses the off-line reinforcement learning. Simulation results show that the method can reduce the on-line learning space and speed up the learning rate.The method provides an effective autonomous learning strategy for multi-robot formation and navigation.