为提高交通控制系统的适应性和鲁棒性,采用强化学习方法实现交通控制模型的学习能力。对固定周期和变周期两种模式下的单交叉口信号配时优化进行研究,构造了等饱和度优化目标的奖赏函数,建立了等饱和度和延误最小两个优化目标的离线Q学习模型。采用对流量进行离散的方法解决了状态维数爆炸问题。通过算例对建立的4种离线Q学习模型解的结构、最优解的分布进行分析,结果表明相对于在线Q学习模型,离线Q学习模型更适合交叉151信号配时优化。采用“离线学习,在线应用”的方法,将建立的定周期延误最小离线Q学习模型与Webster定周期模型的性能进行对比,总体上前者的车均延误和累积延误低于后老n
The development ofa learning model for improving traffic control system adaptability and robustness of the control has an im- portant role. In this paper, we use the reinforcement learning theory to realize the learning ability of traffic control model. The single in- tersection signal timing under fixed cycle and variable cycle has been studied. The paper first proposed the reward function for equal sat- uration principle. Then we proposed the off-line Q-learning models for equal saturation principle and delay minimization goals. The structure of the solutions, and the distribution of the optimal solution of four off-line Q-learning models were analyzed. The paper uses the discretization method of flow rate to solve the dimension explosion. The results show that compared to online Q-learning model, off- line Q-learning model is more suitable for traffic signal timing optimization. Lastly, the paper compares the off-line Q-learning model of delay minimization under fixed cycle and Webster model. The average delay per vehicle and cumulative delay of the former is lower than the latter.