在功率受限的机会频谱接入(OSA)研究中,大多使用完全可观测马尔可夫决策过程(MDP)对环境建模,以提高物理层或介质访问控制(MAC)层指标,但由于感知设备的限制,无法保证用户对环境完全感知。为解决该问题,提出一种基于部分可观测马尔可夫决策过程(POMDP)与Sarsa(2)的跨层OSA优化设计方案。结合MAC层和物理层,采用POMDP对功率受限且有感知误差的次用户频谱感知和接入过程进行建模,并将其转换为信念状态MDP(BMDP),使用Sarsa(λ)算法对其进行求解。仿真结果表明,在功率受限条件下,该Sarsa(λ)—BMDP方案的有效传输容量、吞吐量和频谱利用率分别比完全可观测Q—MDP方案低9%、7%和3%左右,其误比特率比基于点的值迭代PBVI—POMDP方案低20%左右,比Q—MDP方案高16%左右。
Most of the existing studies about Opportunistic Spectrum Access(OSA) under the power constraint use the completely observable Markov Decision Process(MDP) for environmental modeling to improve the single Medium Access Control(MAC) layer or physical layer indicators. Due tO the limitations of the perceived equipment, it is difficult to ensure that users can obtain the environment's fully information. To solve this problem, this paper proposes a cross-layer optimization OSA design based on Partially Observable MDP(POMDP) and Sarsa(λ). The secondary user's spectrum sensing and access problem subject to the power constraint is modeled as a POMDP by combining the medium access control layer and the physical layer. The POMDP is converted to the Belief state MDP(BMDP). The Sarsa(λ) algorithm is used to achieve the solution of BMDP model. Simulation results show that the proposed design under the power constraint reduces the effective transmission capacity, throughput and spectrum utilization by 9%, 7% and 3% compared with Q-MDP scheme, reduces the bit error rate by 20% compared with PBVI-POMDP scheme, and improves the bit error rate by 16% compared with Q-MDP scheme.