在基于性能势的随机逼近方法中引入双时间尺度的概念,提出了离散时间Markov控制过程的基于性能势的双时间尺度仿真梯度算法,弥补了传统算法中每步更新算法更新频率过快和更新环更新算法更新频率过慢的不足,并利用三个数值例子来说明双时间尺度更新算法在计算复杂度、收敛速度和收敛精度上的优势。
A novel two time-scale simulation-based gradient algorithm based on performance potential for discrete time Markov decision process was proposed, by introducing the concept of two time-scale into the performance potential based stochastic approximation. This algorithm tackles the limitations in classical approaches that the every-update simulation- based gradient algorithm updates too frequently, and the regenerative-update gradient algorithm updates too infrequently. Three numerical examples illustrate the superiority of two time-scale simulation-based gradient algorithm in computational complexity, convergence speed and convergence precision.