矩阵乘法是数值分析以及图形图像处理算法的基础,通用的矩阵乘法加速器设计一直是嵌入式系统设计的研究热点。但矩阵乘法由于计算复杂度高,处理效率低,常常成为嵌入式系统运算速度的瓶颈。为了在嵌入式领域更好地使用矩阵乘法,提出了基于MPSoC(MultiProcessor System-on-Chip)的软硬件协同加速的架构。在MPSoC的架构下,一方面,设计了面向硬件约束的矩阵分块方法,从而实现了通用的矩阵乘法加速器系统;另一方面,通过利用MPSoC下的多核架构,提出了相应的任务划分和负载平衡调度算法,提高了并行效率和整体系统加速比。实验结果表明,所提架构及算法实现了通用的矩阵乘法计算,并且通过软硬件协同设计实现的多核并行调度算法与传统单核设计相比在计算效率方面得到了显著的提高。
Matrix multiplication is the basic algorithm of the numerical analysis,graphics and image processing.General matrix multiplication accelerator has always been a research focus in the embedded system design.However,due to the high complexity and the low processing efficiency,matrix multiplication becomes the bottleneck of computation speed of embedded systems.In order to use matrix multiplication in the embedded field,a synergy acceleration architecture of software and hardware based on MPSoC was proposed in this paper.With MPSoC architecture,the partitioning of the matrix considering hardware constraints is implemented in our HW/SW system to enable the computation of general matrix multiplications.The parallel computation with multiple cores and hardware function unit is realized with the load balance algorithms.Parallel efficiency and speed-up ratio are improved.The experimental results show that the proposed general matrix multiplication approach can achieve significant speed-up over the traditional approaches with single core.