径向基函数网络逼近模型可以有效地解决连续状态空间强化学习问题.然而,强化学习的在线特性决定了RBF网络逼近模型会面临"灾难性扰动",即新样本作用于学习模型后非常容易对先前学习到的输入输出映射关系产生破坏.针对RBF网络逼近模型的"灾难性扰动"问题,文中提出了一种基于自适应归一化RBF(ANRBF)网络的Q-V值函数协同逼近模型及对应的协同逼近算法——QV(λ).该算法对由RBFs提取得到的特征向量进行归一化处理,并在线自适应地调整ANRBF网络隐藏层节点的个数、中心及宽度,可以有效地提高逼近模型的抗干扰性和灵活性.协同逼近模型中利用Q和V值函数协同塑造TD误差,在一定程度上利用了环境模型的先验知识,因此可以有效地提高算法的收敛速度和初始性能.从理论上分析了QV(λ)算法的收敛性,并对比其他的函数逼近算法,通过实验验证了QV(λ)算法具有较优的性能.
The radial basis function(RBF)network approximation models can effectively solve the reinforcement learning problems with continuous state space.However,the online characteristic of reinforcement learning determines that the RBF network approximation models are facing the"catastrophic interference"problem,namely the input-output mapping learned in the past is easily collapsed by the learning of new training data.In order to solve the"catastrophic interference"problem of the RBF approximation models,we proposed a collaborative Q-V value function approximation model and a corresponding collaborative algorithm named QV(λ)based on the adaptive normalized RBF(ANRBF)network.The algorithm normalizes the feature vector generated by RBFs,and adjusts the number of the ANRBF network's hidden layer nodes,the center and width of each node online and adaptively,which can effectively improve the anti-interference capacity and flexibility of the approximation model.The collaborative approximation model usesthe Q and V value functions to shape the TD error collaboratively,which can obtain some prior knowledge of the environment model.So we can improve the convergence speed and the initial performance effectively.The convergence of QV(λ)algorithm was analyzed theoretically.Extensive experiments were conducted to show that QV(λ)algorithm has better performance than the other function approximation methods.