研究了现代硬件上的并行内存排序方法,对其研究现状与进展进行了综述.首先简要阐述了经典排序算法以及排序网络的优缺点,分析其并行优化的适用性,然后从现代CPU处理器设备(多核、配备大内存)、图形处理器(GPU)、现场可编程逻辑门阵列(FPGA)等新型处理器设备介绍现有排序方法的研究成果.处理器设备的架构不同,对排序算法的优化策略也不同,现代CPU主要利用线程的本地存储层次优化数据在存储单元中的排列,以减少访存次数及减少访存缺失,同时利用单指令多数据流技术(SIMD),以提高算法的数据级并行度;GPU则需要将多个线程组织成线程块,依靠共享内存提高线程块的访存速度,而在线程块内则使用单指令多线程(SIMT)技术提高线程的执行效率;FPGA则更靠近于硬件底层,受到自身的资源限制,FPGA的优化策略主要依靠硬件描述语言或高级综合语言优化电路的设计,提高资源利用率的同时增加FPGA的吞吐量.现有的成果表明,GPU的并行内存排序性能优于CPU端上的并行内存排序性能.作者最后对未来的研究方向进行了展望.
The research achievements of parallel in-memory sorting method on modern hardware are summarized in this paper. Firstly, the advantages and disadvantages of classical sorting algorithms and sorting network are briefly reviewed and their applicability of parallel optimization is analyzed. Then the state-of-the-art sorting methods implemented on the modern CPU processor device (multicore, equipped with large memory), Graphics Processing Unit (GPU), Field- Programmable Gate Array (FPGA) and other new processor equipments are introduced. Different optimization strategies about sorting algorithm are utilized on different architecture of processors. Modern CPUs mainly utilize thread local memory level with aligned data in order to reduce the frequency of memory access and memory miss. To improve data level parallelism of sorting methods, Single Instruction Multiple Data (SIMD) technology is also involved; Threads of a GPU are organized into thread blocks, using Shared Memory to improve the speed of memory access. Single Instruction Multiple Threads(SIMT) is utilized in thread blocks to improve the execution efficiency of threads; Compared to CPU and GPU, the FPGA is closer to the underlyinghardware, limited by its own resources. Therefore optimization strategies of the FPGA are to optimize the design of circuit using hardware description language or high level synthesis language to make the use of resources more efficient and improve the through of FPGA. According to recent achievements, GPUs have a better performance than CPUs on sorting. Finally, the research interests for further study are proposed in the final section.