基于统一计算设备架构(CUDA)对图形处理器(GPU)下的并行粒子群优化(PSO)算法作改进研究。根据CUDA的硬件体系结构特点,可知Block是串行执行的,线程束(Warp)才是流多处理器(SM)调度和执行的基本单位。为了充分利用Block中线程的并行性,提出基于自适应线程束的GPU并行PSO算法:将粒子的维度和线程相对应;利用GPU的Warp级并行,根据维度的不同自适应地将每个粒子与一个或多个Warp相对应;自适应地将一个或多个粒子与每个Block相对应。与已有的粗粒度并行方法(将每个粒子和线程相对应)以及细粒度并行方法(将每个粒子和Block相对应)进行了对比分析,实验结果表明,所提出的并行方法相对前两种并行方法,CPU加速比最多提高了40。
The parallel Particle Swarm Optimization (PSO) algorithm was improved through Graphics Processor Unit (GPU) based on Compute Unified Device Architecture (CUDA). According to the structural characteristics of the CUDA hardware system, it can be concluded that block is executed serially and the basic scheduled and executive unit of Streaming Multiproeessor (SM) is warp. GPU parallel PSO algorithm based on adaptive warp was carried out in order to make full use of thread parallelism in the block. The dimensions of particles were corresponded to the threads of particles. Each particle was corresponded to one or more warps in accordance with its self-dimension adaptively by using the warp level parallelism of GPU. One or more particles were corresponded to each block. Comparison with the existing coarse-grained parallel approach (corresponding each particle to the thread) and fine-grained parallel approach (corresponding each particle to the block) was made, and the experimental results show that the proposed parallel approach achieves CPU speed-up ratio of 40 more than two kinds of approaches mentioned above.