这份报纸与期望的全部的报酬标准学习可数的连续时间的 Markov 决定过程。作者首先与可能的无界的转变率学习非强迫的模型,并且在作者在下面显示出全部的报酬 optimality 方程并且也的一个答案的存在的控制系统原语数据上给合适的条件一条最佳的静止政策的存在。然后,作者在期望的全部的费用上强加限制,并且考虑联系抑制模型。关于非强迫的模型并且用 Lagrange multipliers 途径基于结果,作者在一些另外的条件下面证明抑制最佳的政策的存在。最后,作者把结果用于控制排队系统。
This paper studies denumerable continuous-time Markov decision processes with expected total reward criteria. The authors first study the unconstrained model with possible unbounded transition rates, and give suitable conditions on the controlled system's primitive data under which the authors show the existence of a solution to the total reward optimality equation and also the existence of an optimal stationary policy. Then, the authors impose a constraint on an expected total cost, and consider the associated constrained model. Basing on the results about the unconstrained model and using the Lagrange multipliers approach, the authors prove the existence of constrained-optimal policies under some additional conditions. Finally, the authors apply the results to controlled queueing systems.