研究离散时间跳变线性二次(JLQ)模型的直接自适应最优控制问题.将强化学习的理论和方法应用于JLQ模型,设计基于Q函数的策略迭代算法,以优化系统性能.在系统参数以及模态跳变概率未知的情况下,Q函数对应的参数矩阵,可通过观察给定策略下系统行为,应用递归最小二乘算法在线估计.基于此参数矩阵,可构造出新的策略使得系统性能更优.该算法可收敛到最优策略.
The discrete-time direct adaptive optimal control problem of jump linear quadratic (JLQ) model is investigated. Reinforcement learning theory and approaches are applied to JLQ model and Q function-based policy iteration algorithm is designed to optimize system performance. When the system parameters and jump probabilities of modes are unknown, the parameter matrix with respcet to Q function is online estimated by observing system behavior under a given control law with recursive least square algorithm. Moreover, based on this matrix, a new policy which can improve system performanc is constructed. The algorithm can converge to the optimal policy.