在择机频谱接入系统中,为解决未知信道环境先验知识下的信道选择问题,提出了一种基于多臂赌博机(MAB)模型的改进UCB(Uppe rConfidence Bound)索引选择策略.该策略是通过在UCB索引的置信因子中引入收益方差值来调整对未知信道环境的探索过程,以降低探索成本.结合理论证明了本策略有较快的收敛速度,还证明了本策略下的学习后悔值曲线与时隙呈近似对数关系而较缓慢增长.仿真结果表明,与原UCB策略以及贪心算法相比,所提策略更能自适应地选择可用性较好的信道,有效降低学习后悔值并加快其收敛速度,从而提高了系统吞吐量.
In the opportunistic spectrum access( OSA) system,in order to solve the problem of channel se-lection without the priori channel statistic information,a novel channel selection strategy is proposed which applies improved upper confidence bound(UCB) based on multi-armed bandit(MAB). Through adding the revenue variance into the confidence factor of UCB index,the proposed strategy can effectively adjust the exploration process of unknown channel environment and reduce the cost of exploration. It is theoreti-cally proved that the proposed strategy has a faster convergent speed and its learning regret curve with time slot is approximately logarithmic and can bring a slower growing rate. The simulation results show that, compared with UCB index algorithm and greedy algorithm,the proposed strategy can adaptively select the channel with better availability,effectively reduce the learning regret and accelerate the convergent speed, thus improving the system throughput.