水面无人艇(unmanned surface vehicle,USV)是一种重要的海洋自主机器人,当前正被广泛研究并逐渐应用于实际.然而USV的安全航行问题仍严重制约其自主性能的提高,尤其是在复杂海况下的危险规避问题亟待解决.以Sarsa在线策略强化学习算法为基础,提出了USV在复杂海况下的自适应危险规避决策模型,并以渐进贪心策略作为行为探索策略,证明了USV自适应危险规避决策过程能够以概率1收敛到最优行为策略.论证结果表明,采用在线策略强化学习算法提升USV在复杂海况下的危险规避性能是可行的.
Unmanned surface vehicle (USV) is a kind of important marine autonomous robots, which has been studied and applied to practice gradually. However, the autonomy of USV is still restricted by the performance of autonomous navigation technology. Especially, the problem of adaptive obstacle avoidance in complicated sea-state marine environments needs to be solved urgently. In the paper, an adaptive avoidance decision process model is proposed for USV to solve the problem of obstacle avoidance in complicated sea-state marine environments. By analyzing the disturbance factors from complicated sea-state marine environments, the model is constructed on the basis of Sarsa on-policy reinforcement learning algorithm. By setting the GLIE (greedy in the limit and infinite exploration) as the action exploration, the convergence of the adaptive avoidance decision process has been proved. The convergence shows that the action can converge to the optimal action strategy with the probability value of one. The proved result demonstrates that the performance of obstacle avoidance of USV in the complicated sea-state marine environment can be enhanced under the action of on-policy reinforcement learning algorithm.