针对标准CUDA光线投射体绘制过程中因线程束内线程计算量不均产生线程束分化,导致计算资源利用率低的问题,提出CUDA线程束步进的算法.首先分析标准CUDA实现导致线程束分化的原因,提出将光线积分映射至线程束上,线程束内所有线程同步分段积分直至光线终止,以避免线程束分化;然后结合光线积分的数学原理和GPU的硬件特性提出线程束内光线积分的算法;最后针对静态线程束任务分配方式导致负载失衡的缺点,提出动态线程束任务分配的实现算法.实验结果表明,动态任务分配线程束步进算法的性能较标准CUDA实现可获得1.9~7.9倍的加速效果.
A CUDA warp marching method for ray casting volume rendering is proposed to address the problem of low computational resource utilization resulted from warp divergence due to irregular workload for each thread in a warp. We firstly analyzed the reasons for warp divergence in standard CUDA implementation. Warp divergences are eliminated by integrating each single ray with all the threads in a warp, which executes instructions in lock-step. The algorithm of integrating one single ray within one warp is introduced after detailing the mathematical principles and GPU hardware specifications. Dynamic work scheduling strategy is also incorporated into warp marching to further improve the overall performance by better balancing the workloads of streaming multi-processors. Experimental results indicate that our dynamic warp marching method can achieve 1.9-7.9 times speedup as compared with standard CUDA implementation.