在分析马尔可夫决策过程(Markov Decision Process,MDP)性能灵敏度的基础上,讨论了部分可观测马尔可夫决策过程(Partially Observable Markov Decision Process,POMDP)的性能优化问题.给出了POMDP性能灵敏度分析公式,并以此为基础提出了两种基于观测的POMDP优化算法:策略梯度优化算法和策略迭代优化算法.最后以准许控制问题为仿真实例,验证了这两个算法的有效性.
The problem of performance optimization for partially observable Markov decision process (POMDP) is addressed based on the sensitivity analysis of Markov decision process (MDP). The sensitivity analysis formulas are given. Based on these results, two observation-based optimization algorithms, i.e., policy-gradient and policy-iteration algorithms are developed for POMDP. To verify these algorithms, a simulation based on the problem of admission control is also presented.