离开薄片代替(能力和冲突) 和在一个分布式的分享的存储器系统的协调读的失误引起执行为几百个周期拖延。这些离开薄片代替和协调读的失误正在复发并且形成二的序列或称为溪流的更多的失误。当流时,优先的流技术忽略了失误和 not-recently-accessed 溪流改组数据。在这份报纸,我们能处理两个问题的现在的溪流 prefetcher 设计。我们的溪流 prefetcher 设计利用等待房间存储 not-recently-accessed 溪流的溪流。溪流等待房间帮助移开更多的离开薄片失误。使用踪迹基于模拟,我们的溪流 prefetcher 设计能移开 8% ~ 66%( 平均 40%) 并且 17% ~ 63%( 平均 39%) 代替和协调读的失误分别地。用周期精确的完整系统的模拟,我们的设计为份额记忆计算机(秒差距) 从 1.00 ~ 1.17 princeton 应用程序仓库给加速工作量与 dedup 和 swaptions 工作量的异常在一个分布式的分享的存储器系统上运行。
Off-chip replacement (capacity and conflict) and coherent read misses in a distributed shared memory system cause execution to stall for hundreds of cycles. These off-chip replacement and coherent read misses are recurring and forming sequences of two or more misses called streams. Prior streaming techniques ignored reordering of misses and not-recently-accessed streams while streaming data. In this paper, we present stream prefetcher design that can deal with both problems. Our stream prefetcher design utilizes stream waiting rooms to store not-recently-accessed streams. Stream waiting rooms help remove more off-chip misses. Using trace based simulation% our stream prefetcher design can remove 8% to 66% (on average 40%) and 17% to 63% (on average 39%) replacement and coherent read misses, respectively. Using cycle-accurate full-system simulation, our design gives speedups from 1.00 to 1.17 of princeton application repository for shared-memory computers (PARSEC) workloads running on a distributed shared memory system with the exception of dedup and swaptions workloads.