东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

The Case of Using Multiple Streams in Streaming

ISSN号：1000-9000
期刊名称：《计算机科学技术学报：英文版》
时间：0
分类：TP393[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术] TP332[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
作者机构：[1]Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science & Technology, Tsinghua University, Beijing 100084, China
相关基金：This work was supported by Higher Education Commission （Pak- istan）, National High Technology Research and Development Pro- gram of China （863 Program）（No. 2008AA01A201）, Natural Sci- ence Foundation of China （Nos. 60833004 and 60970002）, and TNList Cross-discipline Foundation.

作者： Muhammad Abid Mughal Hai-Xia Wang Dong-Sheng Wang[1]

关键词：数据流, 内存系统, 使用周期, 普林斯顿, 工作负载, 分布式, 设计, 共享, Prefetching, stream first in first out （FIFO）, princeton application repository for shared-memory computers （PARSEC）,stream waiting rooms, reordering of misses, sequitur.

中文摘要：

离开薄片代替(能力和冲突) 和在一个分布式的分享的存储器系统的协调读的失误引起执行为几百个周期拖延。这些离开薄片代替和协调读的失误正在复发并且形成二的序列或称为溪流的更多的失误。当流时，优先的流技术忽略了失误和 not-recently-accessed 溪流改组数据。在这份报纸，我们能处理两个问题的现在的溪流 prefetcher 设计。我们的溪流 prefetcher 设计利用等待房间存储 not-recently-accessed 溪流的溪流。溪流等待房间帮助移开更多的离开薄片失误。使用踪迹基于模拟，我们的溪流 prefetcher 设计能移开 8% ～ 66%( 平均 40%) 并且 17% ～ 63%( 平均 39%) 代替和协调读的失误分别地。用周期精确的完整系统的模拟，我们的设计为份额记忆计算机(秒差距) 从 1.00 ～ 1.17 princeton 应用程序仓库给加速工作量与 dedup 和 swaptions 工作量的异常在一个分布式的分享的存储器系统上运行。

英文摘要：

Off-chip replacement （capacity and conflict） and coherent read misses in a distributed shared memory system cause execution to stall for hundreds of cycles. These off-chip replacement and coherent read misses are recurring and forming sequences of two or more misses called streams. Prior streaming techniques ignored reordering of misses and not-recently-accessed streams while streaming data. In this paper, we present stream prefetcher design that can deal with both problems. Our stream prefetcher design utilizes stream waiting rooms to store not-recently-accessed streams. Stream waiting rooms help remove more off-chip misses. Using trace based simulation% our stream prefetcher design can remove 8% to 66% （on average 40%） and 17% to 63% （on average 39%） replacement and coherent read misses, respectively. Using cycle-accurate full-system simulation, our design gives speedups from 1.00 to 1.17 of princeton application repository for shared-memory computers （PARSEC） workloads running on a distributed shared memory system with the exception of dedup and swaptions workloads.

同期刊论文项目