为提高多核处理器性能,在传统硬件加速部件的基础上,提出一种新型的运算阵列设计方案。将运算阵列与多核处理器的通信端口映射在扩展寄存器地址空间上,实现阵列与多核处理器的紧密耦合。通过片上网络连接各个运算单元,实现运算阵列的灵活配置和高度共享。在实验系统上实现1024点快速傅里叶变换和H.264解码器,结果表明,与纯软件实现相比,该方案能使处理器性能和功耗都有所改善。
In order to improve the performance of multi-core processor, based on traditional hardware accelerator, this paper presents a novel computing array design scheme. The communication ports between the computing array and the processor are mapped in the address space of extended register file, which makes the computing array and the processor tightly coupled. The computing units are connected by Network-on-Chip(NoC) which enables the computing array be flexibly configured and highly shared by the multi-core processor. A 1 024-point Fast Fourier Transform(FFT) and an H.264 decoder are implemented on the experimental platform, and results show that the scheme can improve the performance and power consumption significantly compared to pure software solution.