东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

STRONG N-DISCOUNT AND FINITE-HORIZON OPTIMALITY FOR CONTINUOUS-TIME MARKOV DECISION PROCESSES

ISSN号：1009-6124
期刊名称：《系统科学与复杂性学报：英文版》
时间：0
分类：O231[理学—运筹学与控制论;理学—数学] TJ761.13[兵器科学与技术—武器系统与运用工程]
作者机构：[1]School of Mathematical Sciences and Institute of Finance and Statistics,Nanjing Normal University,Nanjing 210023,China, [2]School of Mathematics and Computational Science,Zhongshan University,Guangzhou 510275,China
相关基金：supported by the National Natural Science Foundation of China under Grant Nos.61374080 and 61374067; the Natural Science Foundation of Zhejiang Province under Grant No.LY12F03010; the Natural Science Foundation of Ningbo under Grant No.2012A610032; Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions

关键词：马尔可夫决策过程, 连续时间, 折扣, 最优平稳策略, 控制问题, 最优策略, 地平线, 回报率, Continuous-time Markov decision process expected average reward criterion finite-horizon optimality Polish space strong n-discount optimality

中文摘要：

这份报纸学习强壮的 n (n = 1， 0 ) 为在波兰的空格的连续时间的 Markov 决定过程的折扣和有限地平线标准。相应转变率被允许无界，并且报酬率可以有既不上面也不更低的界限。在温和条件下面，作者证明强壮的 n 的存在(n = 1， 0 ) 由开发二种等价关系打折最佳的静止政策：一个人在标准期望的平均报酬和强壮的 1 折扣 optimality 之间，并且其它在偏爱和强壮的 0 折扣 optimality 之间。作者也由开发一个正规三位字节的有趣的描述为一个有限地平线控制问题证明一条最佳的政策的存在。

英文摘要：

This paper studies the strong n（n =-1,0）-discount and finite horizon criteria for continuoustime Markov decision processes in Polish spaces.The corresponding transition rates are allowed to be unbounded,and the reward rates may have neither upper nor lower bounds.Under mild conditions,the authors prove the existence of strong n（n =-1,0）-discount optimal stationary policies by developing two equivalence relations：One is between the standard expected average reward and strong-1-discount optimality,and the other is between the bias and strong 0-discount optimality.The authors also prove the existence of an optimal policy for a finite horizon control problem by developing an interesting characterization of a canonical triplet.

同期刊论文项目