马尔可夫决策过程(MDP)的许多优化算法一般依赖系统的转移速率,而系统参数的不确定性使得MDP的转移速率往往很难精确得知。针对一类不确定多链MDP模型,基于性能势对参数不相关和相关两种情况下的鲁棒控制问题进行了探讨,并分别给出求解系统最优鲁棒性能的策略迭代和并行遗传算法。最后,通过一个数值例子分析相关算法的有效性。
Optimization techniques for Markov decision Process (MDP) usually depend on the transition rates of the underlying stochastic processes, whose exact values are hard to get due to the possible uncertainty of system parameters. The robust control of a class of uncertain multi-chain MDP was discussed with either independent parameters or dependent parameters, by using performance potential. A policy iteration algorithm and parallel genetic algorithm was respectively provided to derive the system's robust optimal performance. Finally, a numerical example was used to illustrate the effectiveness of these algorithms.