提出了一种可以进行列主元选取的细粒度LU分解流水线算法并在现场编程门阵列(FPGA)上得到了实现。该算法可以在进行列主元选取的同时,充分利用数据的重用性,以减少数据读写次数。对其中的关键运算实现了细粒度全流水,提高了分解性能。与Celeron(R)3.07GHz通用处理器主机相比可以得到平均6到7倍的加速比。与其他在FPGA上实现的LU分解算法相比,该算法在占用相对较少资源和保持高分解效率的前提下提高了计算的精确度和稳定性。
This paper presents a fine-grained pipeline algorithm for LU decomposition with column partial pivoting and gives the description of its implementation on field-programmable gate arrays (FPGA). The pipeline algorithm makes full use of the data reuse property of the LU decomposition during the column partial pivoting in order to reduce the I/O cost. Since the critical functions are pipelined in fine-granularity, the decomposition performance can be improved. The experimental result shows that the computing speed can be 6 times higher than that of the software execution of the serial algorithm on Celeron(R) 3.07GHz. Compared with other FPGA implementations, the proposed design has the better computational accuracy and stability due to the pivoting scheme, while demanding less resource and keeping the high efficiency.