针对空间认知导向下模型驱动型路径规划和人们认知偏好多样性之间的矛盾,提出了一种基于分层强化学习的交互学习型路径规划方法。该方法将最优路径标准转换为路口处转向决策的瞬时奖励值,并通过预学习和实时学习两个阶段实现高效地发现总奖励值最大的最优路径策略。其中,预学习阶段自动发现子目标节点,并构建包含局部最优策略的子任务;实时学习阶段利用预定义策略实现高效的Q值更新,并根据Q值追溯最优路径。实验表明,该方法具有足够好的实时性和最优性。
Against the contradictions between model-driven route planning and the diversity of human cognitive preferences for spatial cognition oriented optimal routes, we present a kind of interactive route planning approach based on hierarchical reinforcement learning. In this approach, optimal route criterias are translated into immediate rewards of turning deci- sions at intersections, and optimal route policies with maximal cumulative rewards can be found through a two-stage learning process. The first pre-learning stage automatically identi- fies some nodes in road network as subgoals and constructs corresponding subtasks contai- ning local optimal route policies for achieving the subgoals. The second real-time learning stage focuses on efficiently updating the Q values of every available state-action pair using predefined policies, and tracing the optimal routes according to Q values. The experimental results show that our proposed approach learns effectively enough and ensures the routes found close to global optimal ones.