东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

模拟人类发散思维的测度值马尔可夫理论模型

ISSN号：0469-5097
期刊名称：《南京大学学报：自然科学版》
时间：0
分类：TP18[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]东南大学计算机科学和工程学院,南京210096, [2]南京大学计算机软件新技术国家重点实验室,南京210093
相关基金：国家自然科学基金（90412014）,计算机软件新技术开放课题（A200707）

关键词：测度值, 测度值分枝过程, 马尔可夫决策过程, measure-valued Markov decision processes, measure-valued branching processes, Markovdecision processes

中文摘要：

本文提出测度值马尔可夫决策过程新模型．在此模型下，agent对环境的把握用测度概念来表示，于是agent则根据测度来决定自己的最优行动以得到最优策略，因此本文也提供了测度值马尔可夫决策过程的最优策略算法．该模型是部分可观察马尔可夫决策过程的推广，它反映人类思维的一个重要特征，人们在把握全部状态可能性（即对状态空间进行权衡度量）的态势下，思考问题并选择自己的最优行动．部分可观察马尔可夫决策过程只是它的一种特例．

英文摘要：

This paper presents a model called measure-valued Markov decision processes （MVMDPs） and within this model the understanding of the agent to the environment is denoted by the mathematical notion of measure. The agent decides his own optimal action according to this measure and then acquires his optimal policy. So we present an algorithm of finding optimal policy under MVMDP, which can also be considered as the approximate optimal policy algorithm of partially observed Markov decision processes （POMDPs）. This model is a generalization of a partially observed Markov decision process, that is, partially observed Markov decision process is a particular case of the measure-valued Markov decision process. Be that as it may, it is essentially different from all other papers about POMDPs. Firstly, the main spirit of general POMDPs is to transform partially observable Markov decision problems off a physical state space into a regular Markov decision problem （MDP） on the corresponding belief state space, and such researches all identify the belief state as a probability distribution over the state space. So most of the POMDP models based on this spirit pay more attention to algorithm Of various kinds for finding the optimal policy and to novel refinements of existing techniques. However, our work is not based on the transformation between the POMDP on a physical state space and the MDP on a belief state space. On the contrary we take the measure, a more general notion than belief state, on the state space as a new studying object. Then the Markov decision problem we will discuss is taking place on the space composed of these measures. In this way, we have a measure-valued Markov decision process. Secondly, MVMDP, based on the latest theory of measur-valued branching processes in modern probability, reflects an important characteristic of human mind： that people think about problems and choose their own optimal actions in contexts where all the possible states are caught （i. e. , they are able to appropria

同期刊论文项目