将传统强化学习算法应用到交叉口自适应交通信号控制中,存在着维数灾难的问题,即状态和动作空间大小随着交叉口的增加而呈指数增长。因此,将交叉口自适应交通信号控制问题看成马尔科夫决策过程(MDP)问题,通过有效地利用基于特征的状态表示和线性平均函数估计思想,减少了计算复杂度,保证了收敛性。在设置的多交叉口交通环境下,仿真试验表明:在不同的交通需求水平和车流到达分布下,此算法均优于定时控制和传统的强化学习算法,并且其参数θ和学习步数是收敛的。
The application of traditional reinforcement learning algorithm to adaptive traffic signal control of intersection suffers from the problem of the curse of dimensionality where the size of state and action space increases exponentially with the number of intersection. Therefore, after regarding the adaptive traffic signal control problem of intersection as a MDP problem, feature-based state representation and linear average function approximation are effectively used to reduce the computational complexity and ensure the convergence. In the scenario of multiple intersections, the simulation result shows that the proposed method outperforms the fixed timing control and traditional reinforcement learning under different traffic demand levels and arrival profiles, and its θ parameter and learning steps are convergent.