根据连续时间马尔可夫决策过程的平均准则,给出了一种特殊的马尔可夫决策过程一受控排队系统平均最优以及约束最优的新条件.这个新条件仅使用模型的初始数据,但利用了生灭过程的遍历性理论.可以证明受控排队系统存在平均最优平稳策略与约束平均最优策略.
For a special Markov decision process based on the continuous-time Markov decision processes with the average criterion, a new set of conditions is proposed for both the optimality and constrained optimality for a controlled queuing system. These conditions only employ the initial data of the controlled system, but make use of the ergodicity of a birth and death process. By using the Lagrange multipliers approach, the existence of an average optimal stationary policy and a constrained average-optimal policy can be confirmed.