针对分层强化学习需要人工给出层次结构这一问题,同时考虑到基于状态空间的自动分层方法在环境状态中没有明显子目标时分层效果并不理想的情况,提出一种基于动作空间的自动构造层次结构方法。首先,根据动作影响的状态分量将动作集合划分为多个不相交的子集;然后,分析Agent在不同状态下的可用动作,并识别瓶颈动作;最后,由瓶颈动作与执行次序确定动作子集之间的上下层关系,并构造层次结构。此外,对MAXQ方法中子任务的终止条件进行修改,使所提算法构造的层次结构可以通过MAXQ方法找到最优策略。实验结果表明,所提算法可以自动构造层次结构,而不会受环境变化的干扰。与Q学习、Sarsa算法相比,MAXQ方法根据该结构得到最优策略的时间更短,获得回报更高。验证了所提算法能够有效地自动构造MAXQ层次结构,并使寻找最优策略更加高效。
Since a hierarchy of Markov Decision Process (MDP) need to be constructed manually in hierarchical reinforcement learning and some automatic hierarchical approachs based on state space produce unsatisfactory results in environment with not obvious subgoals, a new automatic hierarchical approach based on action space partition was proposed. Firstly, the set of actions was decomposed into some disjoint subsets through the state component of the action. Then, bottleneck actions were identified by analyzing the executable actions of the Agent in different states. Finally, based on the execution order of actions and bottleneck actions, the relationship of action subsets was determined and a hierarchy was constructed. Furthermore, the termination condition for sub-tasks in the MAXQ method was modified so that by using the hierarchical structure of the proposed algorithm the optimal strategy could be found through the MAXQ method. The experimental results show that the algorithm can automatically construct the hierarchical structure which was not affected by environmental change. Compared with the QLearning and Sarsa algorithms, the MAXQ method with the proposed hierarchy obtains the optimal strategy faster and gets higher returns. It verifies that the proposed algorithm can effectively construct the MAXQ hierarchy and make the optimal strategy more efficient.