东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于马氏决策过程模型的动态系统学习控制:研究前沿与展望

ISSN号：0254-4156
期刊名称：自动化学报
时间：2012
页码：673-687
分类：TP273.22[自动化与计算机技术—控制科学与工程;自动化与计算机技术—检测技术与自动化装置]
作者机构：[1]国防科学技术大学机电工程与自动化学院自动化研究所,长沙410073, [2]中国科学院自动化研究所,北京100190, [3]复杂系统智能管理与控制国家重点实验室,北京100190, [4]美国亚利桑那大学系统与工业工程学院, [5]国防科学技术大学军事计算实验与平行系统技术研究中心,长沙410073
相关基金：国家自然科学基金（61075072,90820302,60921061）; 霍英东青年教师基金优选资助课题（114005）; 教育部新世纪优秀人才支持计划（NCET-10-0901）资助~~
相关项目：结构化增强学习及其在虚拟人运动规划中的应用

关键词：学习控制, MARKOV决策过程, 增强学习, 近似动态规划, 机器学习, 自适应控制, DOI 10.3724/SP.J.1004.2012.00673

中文摘要：

基于马氏决策过程（Markov decision process,MDP）的动态系统学习控制是近年来一个涉及机器学习、控制理论和运筹学等多个学科的交叉研究方向,其主要目标是实现系统在模型复杂或者不确定等条件下基于数据驱动的多阶段优化控制.本文对基于MDP的动态系统学习控制理论、算法与应用的发展前沿进行综述,重点讨论增强学习（Reinforcement learning,RL）与近似动态规划（Approximate dynamic programming,ADP）理论与方法的研究进展,其中包括时域差值学习理论、求解连续状态与行为空间MDP的值函数逼近方法、直接策略搜索与近似策略迭代、自适应评价设计算法等,最后对相关研究领域的应用及发展趋势进行分析和探讨。

英文摘要：

Learning control of dynamical systems based on Markov decision processes （MDPs） is an interdisciplinary research area of machine learning, control theory, and operations research. The main objective in this research area is to realize data-driven multi-stage optimal control for complex or uncertain dynamical systems. This paper presents a comprehensive survey on the theory, algorithms, and applications of MDP-based learning control of dynamical systems. Emphases are put on recent advances in the theory and methods of reinforcement learning （RL） and adaptive/approximate dynamic programming （ADP）, including temporal-difference learning theory, value function approximation for continuous state and action spaces, direct policy search, approximate policy iteration, and adaptive critic designs. Applications and the trends for future research and developments in related fields are also discussed.

同期刊论文项目

智能控制与计算智能的方法及应用

期刊论文 113 会议论文 156 专利 20 著作 7

高速公路车辆智能驾驶中的关键科学问题研究

期刊论文 151 会议论文 37

结构化增强学习及其在虚拟人运动规划中的应用

期刊论文 16 会议论文 10

同项目期刊论文

基于激光雷达直线特征提取的自主车辆可通行区域检测

一种移动机器人SLAM中的多假设数据关联方法

Universal Strategy for Surveillance Video Defogging

质量工程的新亮点——国家精品视频公开课

基于阿克曼原理的车式移动机器人运动学建模

基于多模态Rao-Blackwellized进化粒子滤波器的移动机器人航迹推算系统的故障诊断

高速公路禁令标志检测与跟踪

基于粒子群优化的移动机器人SLAM方法

一种快速的模板匹配算法

A Study of Multi-Robot Stochastic Increment Exploration Mission Planning

On the non-equivalence between Lorenz System and Chen system

MARKOV SKELETON PROCESS IN PERT NETWORKS

结构化道路车道线的鲁棒检测与跟踪

Lpq-norm estimates associated with Burkholder inequalities

一种自适应的区域生长算法用于道路分割

图像去雾算法清晰化效果客观评价方法

强背景噪声下的多精度传感器故障诊断研究

Fast haze removal algorithm for surveillance video

Anomaly detection method based on kinematics model and nonholonomic constraint of vehicle

车载单线激光雷达成像分割及噪声检测算法

基于形状标记图和Gabor小波的交通标志识别

THE ERGODICITY OF STOCHASTIC GENERALIZED POROUS MEDIA EQUATIONS WITH LEVY JUMP

移动机器人SLAMiDE系统设计与实现

OPTIMAL INVESTMENT AND REINSURANCE IN A JUMP DIFFUSION RISK MODEL

基于雾气理论的视频去雾算法

移动机器人SLAM中一种混合数据关联方法

Ruin probability and optimal investment and excess of loss reinsurance policy

基于MSA不变矩的道路导向标线分类

不同精度的冗余传感器故障诊断研究

基于PCA的可变框架模型Retinex图像增强算法

随机增长网络模型的稳定性分析

多移动机器人通信系统研究进展

Triple RRTs: An Effective Method for Path Planning in Narrow Passages

图像去雾技术研究综述与展望

一种带宽约束的无线传感器网络节点调度算法

一种求解全局优化问题的混合自适应正交遗传算法

基于遗传算法的无线传感器网络重新部署方法

Performance analysis of M/G/1 queue with working vacations and vacation interruption

基于二维主成分分析的交通标志牌识别

Dynamic cluster member selection method for multi-target tracking in wireless sensor network

Constrained evolutionary optimization by means of (µ+λ)- differential evolution and imp

Performance analysis of MAP/G/1 queue with working vacations and vacation interruption

A spatial orthogonal allocation and heterogeneous cultural hybrid algorithm for multi-robot explorat

基于汽车运动学模型的航迹发生器设计与仿真

基于改进粒子群优化的Fastslam方法

一类无标度随机图的度序列

基于变长扫描模型的故障检测方法及其在GPS/INS组合导航系统中的应用

基于无抽取haar算法的实时卡尔曼滤波方法研究

Optimal reinsurance and investment under the CEV model in jump diffusion risk process

快速鲁棒的交通标志检测方法

城市环境中交通信号灯准确识别与状态估计

基于Gabor多尺度空间的不变兴趣点检测

基于Uni-Tire轮胎模型的车辆质心侧偏角估计

一种正交混沌蚁群算法在群机器人任务规划中的应用研究

基于FPGA/SOPC的预测控制器设计与实现

异常检测系统的漏洞分析

基于EKF的汽车轮胎力估计研究

A hybrid multi-swarm particle swarm optimization to solve constrained optimization problems

一种新的基于正交实验设计的约束优化进化算法

字典序进化算法用于组合优化问题

求解全局优化问题的混合自适应正交遗传算法

约束优化进化算法

Vision-based long-distance lane perception and front vehicle location for full autonomous vehicles o

A Discrete-time Retrial Queue with Two Types of Server Interruptions

Stochastic generalized porous media equations with Levy jump

带移民和拯救的碰撞分枝过程的性质

粒子群优化的多机器人协作定位方法

基于差异进化的克隆选择算法

汽车控制的研究现状与展望

基于FPGA/Nios-Ⅱ的矩阵运算硬件加速器设计

Stereo matching using weighted dynamic programming on a single-direction four-connected tree

Visual Saliency Based on Scale-Space Analysis in the Frequency Domain.

Degree distribution of a scale-free random graph model

基于观测器的输出反馈电子节气门控制器设计

A regularity model-based multiobjective estimation of distribution algorithm with reducing redundant

Enhancing the search ability of differential evolution through orthogonal crossover

Recognition algorithm for turn light of front vehicle

Fault detection and identification for dead reckoning system of mobile robot based on fuzzy logic pa

应用圆形度和颜色直方图的交通信号灯识别

描述逻辑非标准推理

基于最大不动点模型的描述逻辑系统FLε的有穷基

基于多模型表示的交通标志识别算法设计

中国人工智能40年

中国机器人学40年

多机器人地图融合方法研究

一种簇结构下的多移动机器人通信方法

当代学习自适应混合离散粒子群算法研究

一种无线传感器网络的二次部署方法

自然场景中字符型交通标志的检测算法

城市环境中箭头型交通信号灯的实时识别算法

基于LSPI和滚动窗口的移动机器人反应式导航方法

支持向量机平凡解判别与修正的新方法

UKF与Mean shift算法相结合的实时目标跟踪

一种改进的基于活动轮廓和光流的运动目标分割方法

一种基于双平行平面的激光雷达和摄像机标定方法

“中南移动二号”多移动机器人通信系统

改进混合离散粒子群的多种优化策略算法

基于无抽取Haar算法的实时卡尔曼滤波方法

非结构化环境下自主车辆轨迹规划方法

一种快速的Isomap算法

进化粒子滤波器对比研究及其在移动机器人故障诊断中的应用

一种在红外图像中定位人眼的方法

面向多机器人系统的增强学习研究进展综述

基于统计证据的半监督多分类器融合方法

基于多模态进化Rao-Blackwellized粒子滤波器的移动机器人航迹推算系统的故障诊断

一种基于圆形角点的瞳孔定位算法

H.264中快速运动估计算法的一种改进方案

越野环境下自主车辆导航地图自动创建方法研究

增强学习算法的性能测试与对比分析

固态硬盘性能优化研究与实现

一种面向FPGA的快速Hough变换

一种融合多级稀疏表达和度量学习的目标跟踪方法

基于SAD与UKF-Mean Shift的主动目标跟踪

基于改进的粒子群优化的FastSLAM方法

均分点蚁群算法在群集机器人任务规划中的应用与研究

一种基于摄像机视角的立体视觉定位方法

融合动力学约束的自主平行泊车轨迹生成方法

四种虚拟力模型在传感器网络覆盖中的性能分析

一种基于预测模型的均值偏移加速算法

无线遥控智能车的控制研究

基于优化的深度图像修复与恢复算法

A spatial orthogonal allocation and heterogeneous cultural hybrid algorithm for multirobot exploration mission planning

信息审计中短消息中心实验环境的仿真

一种改进的RRT路径规划算法

Contributions to Horn-Schunck optical flow equations-part I： Stability and rate of convergence of classical algorithm

采用核增强学习方法的多机器人编队控制

Mild Solution of Stochastic Equations with Levy Jumps： Existence, Uniqueness, Regularity and Stability

Application of clustering analysis to team management

Social Media Analytics and Intelligence

复杂生产系统的平行管理方法与案例

Modeling Social Causality and Responsibility Judgment in Multi-Agent Interactions

Distributed modeling in a mapreduce framework for data-driven traffic flow forecasting

Self-teaching adaptive dynamic programming for Go-Moku

From Causal Scenarios to Social Causality: An Attributional Approach

Toward Digital Asset Protection

Principal Curve Algorithms for Partitioning High-Dimensional Data Spaces

Iterative learning control for non-linear systems with deadzone input and time delay in presence of

进的乙肝病毒感染动力学模型研究

Parallel traffic management system and its application for 2010 asian games

电力企业物资编码标准化设计与优化

Social and Economic Computing Introduction

Hierarchical and Modular Surveillance Systems in ITS

Adaptive speed tracking control for autonomous land vehicles in all-terrain navigation: an experimen

A hierarchical reinforcement learning approach for optimal path-tracking of wheeled mobile robots.

Reinforcement learning algorithms with function approximation: Recent advances and applications

A Bibliographic Analysis of IEEE Intelligent Systems Publications

基于ACP方法的城市轨道交通平行系统体系研究

2008年北京市手足口病流行的规律与传播机制

提升乙烯长周期生产管理的平行评估方法

北京市SARS流行的特征与时空传播规律

Agent Recommendation for Agent-Based Urban Transportation Systems

A complementary modularized ramp metering approach based on iterative learning control and ALINEA

Next Generation Team Science Platform for Scientific Collaboration

分层Dirichlet过程及其应用综述

Cloud Computing for Agent-Based Urban Transportation Systems

Goal Representation Heuristic Dynamic Programming on Maze Navigation

Another Look at Linear Compensator Design: A Classic Control Problem Revisited

Data-Driven Intelligent Transportation Systems: A Survey

Guest editorial: data-based control, modeling, and optimization

国电集团物资管理系统的关键绩效指标(KPI)设计与优化

A Modeling and Computing Platform of Artificial Transportation Systems based on Game Engine

人工交通系统计算实验的分布式框架研究

基于多特征融合的视频交通数据采集方法

基于社会网络视角的交通仿真和计算实验研究文献分析

高速铁路平行控制与管理系统研究框架

Research Collaboration and ITS Topic Evolution: 10 Years at T-ITS. IEEE Transactions on Intelligent

Spatio-temporal evolution of Beijing 2003 SARS epidemic

Adaptive Dynamic Programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems w

基于OGRE的车辆自适应巡航控制三维仿真

跨搜索引擎关键字竞价广告预算分配策略

To improve safety and reliability of nuclear power plant with parallel system method

Tussle between APs in a location-dependent pricing game

The fourth type of covering-based rough sets

Motion vehicles tracking in intelligent transportation visual surveillance systems

Adaptive critic design with graph Laplacian for online learning control of nonlinear systems.

Co-evolution-based mechanism design for sponsored search advertising

Probabilistic Plan Inference for Group Behavior Prediction

Personalized search strategy for spatial information on the Web

Listwise approaches based on feature ranking discovery

A budget optimization framework for search advertisements across markets

Semantic modeling for artificial society in parallel emergency management system

Personalized Search Strategies for Spatial Information on the Web

Computational traffic experiments based on artificial transportation systems: an application of ACP

Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using a

Mining actionable behavioral rules

Advanced manufacturing technology in China: a roadmap to 2050

Online Learning Control Using Adaptive Critic Designs With Sparse Kernel Machines

基于LSPI和滚动窗口的移动机器人反应式导航方法

面向多机器人系统的增强学习研究进展综述

Finite horizon optimal control of discrete-time nonlinear systems with unfixed initial state using adaptive dynamic programming

The Chinese “Human Flesh” Web： the first decade and beyond

Denoising of hyperspectral imagery by cubic smoothing spline in the wavelet domain

快速的三维人手运动跟踪方法研究

期刊信息

《自动化学报》
中国科技核心期刊

主管单位:中国科学院
主办单位:中国自动化学会中国科学院自动化研究所
主编：王飞跃
地址：北京东黄城根北街16号
邮编：100717
邮箱：aas@ia.ac.cn
电话：010-64019820

国际标准刊号：ISSN：0254-4156
国内统一刊号：ISSN：11-2109/TP
邮发代号:2-180

获奖情况:
1997年获全国优秀期刊奖,1985、1990、1996、2000年获中国科学院优秀期刊二等奖,2002年获国家期刊奖

国内外数据库收录:
美国数学评论（网络版）,德国数学文摘,荷兰文摘与引文数据库,美国工程索引,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:27550