提出了一种基于归一化径向基函数的自适应启发评价强化学习算法,用于异构无线网络系统中自主的动态频谱分配.该算法利用归一化径向基函数自适应构建状态空间,加快学习速度;利用自适应启发评价机制减少不必要的探索,提高学习效率.通过与无线环境交互,算法学会为不同接入网内的各个会话动态分配合适的频段.仿真结果表明,在同等网络条件下,该算法能获取更好的频谱利用率和服务质量,性能优于确定性频谱分配策略和一般的动态频谱分配策略.
An adaptive heuristic critic(AHC) Reinforcement Learning algorithm is presented for the dynamic spectrum allocation in an autonomously deciding mode in heterogeneous radio networks based on the normalized radial basis function(NRBF).The algorithm accelerates the learning speed by utilizing the NRBF when constructing the state space,and improves the learning efficiency by using the AHC scheme to reduce the unnecessary exploration.Through interactions with the radio environment,it learns to allocate the proper frequency band for each session in multiple radio access networks.Simulation results show that the proposed algorithm can lead to a better spectrum efficiency and quality of service compared with to the fixed frequency planning scheme or general dynamic spectrum allocation policy.