为了提高嵌入式图形处理器的纹理单元效率,提出了一种多端口纹理高速缓存(Texture Cache)结构。该结构采用了基于块的光栅化和块交错的纹理内存组织,能够充分发掘数据间相关性,提高了Cache命中率;此外该结构采用Cache预取技术,有效隐藏了访存延迟;为了进一步提高数据吞吐率,设计了4个读端口,可支持并行读取4个纹素。仿真结果表明,设计的Cache可达到92%左右命中率,访存性能可达到零延迟内存系统的90%,数据吞吐率是单端口Cache的3~4倍。
In order to improve GPU's texture unit efficiency ,a multiported texture cache architecture is proposed . The architecture employs a tiling rasterization order and a block interleaving memory organization ,which can fully exploit the data locality and improve cache hit rate . In addition , the architecture employs cache prefetching technology ,which can hide memory latency . In order to further improve data throughput , four read ports supporting 4 parallel reading are designed .Simulation results show that the hit rate of the proposed cache is about 92% and can attain 90% of the performance of a zero latency memory system .The data throughput is about 3~4 times of the single ported cache .