在诸多计算领域中,硬件加速器可以代替通用处理器上执行的软件完成专用功能,达到提高性能和降低功耗的目的.网络应用中,许多硬件加速器是无状态的,这就需要一个网络流的全部数据包到达后才能被处理.有状态加速器则可以确保每个数据包到达后即可被处理,因而具有更好的性能和灵活性.由于网络流的并发性,有状态加速器需要维护众多并发网络流的状态,并在需要时进行硬件状态切换,从而增加了加速器的性能开销.该文基于请求队列提出对不同网络流的请求进行动态重排序的方法,其中请求所在的队列可以在片上也可以在片外,从而有效减少加速器的状态切换次数.对多种流行的有状态加速器进行的实验结果表明,该方法可以有效降低加速器的平均响应时间并提高吞吐率.实验结果表明:与传统的FIFO设计对比,解压缩加速器的吞吐率最大提高了26.7%,响应时间最大减少了50%.
In many computing domains,hardware accelerators can improve throughput and lower power consumption,instead of executing functionally equivalent software on the general-purpose micro-processors cores.While hardware accelerators often are stateless,network processing exemplifies the need for stateful hardware acceleration.The packet oriented streaming nature of current networks enables data processing as soon as the packets arrive rather than when the data of the whole network flow is available.Due to the concurrence of many flows,an accelerator must maintain and switch contexts between many states of the various accelerated streams embodied in the flows,which increases overhead associated with acceleration.This paper proposes to dynamically reorder the requests of different accelerated streams in a hybrid on-chip/memory based request queue to reduce the associated overhead.Through a simulation-based performance study,the effectiveness of the proposed mechanism for different popular stateful accelerators is shown.The experimental results shown the approach can help reduce the average response time significantly and improve throughput up to 26.7% and response time reduction of upto 50% for decompression acceleration compared with the traditional FIFO order design.