这份报纸与多限制,州依赖者的折扣因素,和可能无界的费用在可数的州、紧缩的 Borel 行动空格集中于分离时间的 Markov 决定处理的第一个段落(DTMDP ) 的抑制 optimality 问题(警察) 。借助于一条政策的一项所谓的职业措施的性质,我们证明抑制 optimality 问题等价于一(无限维) 职业的集合上的线性编程与一些限制测量,并且因此在合适的条件下面证明一条最佳的政策的存在。用在抑制 optimality 问题和线性编程之间的等价,而且,我们为有限状态和行动的盒子获得一条最佳的政策的一种准确形式。作为一个例子,最后,一个控制排队系统被给说明我们的结果。
This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results.