有限的片外存储带宽是制约流处理器性能提升的瓶颈之一,流存储系统已经采用了多种方式来缓解这个问题,但当前的设计并没有充分考虑应用具体的访存模式对有效带宽利用率的影响.通过分析和实验,评估流存储系统主要设计参数对不同访存模式的优化效果;在此基础上针对不同的流访问并行度提出了相应的结构改进,加入宽发射和短作业优先调度支持,充分挖掘存储访问的局部性和并行性,改善了负载平衡,从而有效地提高了片外带宽的使用效率和流程序的整体性能.
The limited off-chip bandwidth of memory accesses increasingly becomes the bottleneck of entire stream processing system. Many methods have been adopted into stream memory system to alleviate this problem,but current design didn't consider enough about the relationship between application-specific memory accessing patterns and the utilization rate of off-chip bandwidth. This paper first estimates the effect of primary design parameters targeted on different access patterns through analysis and experiments. Based on these results,some architecture modifications are proposed for various parallel degrees of stream accesses. By widening the address generators and adding short-task priority scheduling,the locality and parallelism among memory accesses are explored fully,along with better load balance. These optimizations can significantly improve the utilization efficiency of DRAM bandwidth and further boost the final performance of the entire streaming program.