东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

片上多处理器中延迟和容量权衡的Cache结构

期刊名称：肖俊华，冯子军，章隆兵，片上多处理器中延迟和容量权衡的Cache结构. 计算机研究与发展. 46（1
时间：0
分类：TP302[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国科学院计算技术研究所系统结构重点实验室,北京100190, [2]中国科学院研究生院,北京100049
相关基金：国家自然科学基金项目（60673146,60603049,60703017,60736012）;国家“八六三”高技术研究发展计划基金项目（2006AA010201,2007AA012114）;国家“九七三”重点基础研究发展规划基金项目（2005CB321600）
相关项目：共享二级Cache的片上多处理器Cache块分布技术研究

关键词：片上多处理器, TCLC, 二级Cache, 复制, 迁移, 中心放置, chip multiprocessors, TCLC, L2 cache, replication, migration, center placement

中文摘要：

片上多处理器中二级cache的设计面临着延迟和容量不能同时满足的矛盾,私有结构有较小的命中延迟但是减少了cache的有效容量,共享结构能增加cache的有效容量但是有较长的命中延迟.提出了一种适用于CMP的cache结构——延迟和容量权衡的cache结构（TCLC）.该结构是一种混合私有结构和共享结构的设计,核心思想是动态识别cache块的共享类型,根据不同共享类型分别对其进行优化,对私有cache块采用迁移的优化策略,对共享只读cache块采用复制的优化策略,对共享读写cache块采用中心放置的优化策略,以期达到访问延迟接近私有结构,有效容量接近共享结构的目的,从而缓解线延迟的影响,减少平均内存访问延迟.全系统模拟的实验结果表明,采用TCLC结构,相对于私有结构性能平均提高13.7%,相对于共享结构性能平均提高12%.

英文摘要：

Chip multiprocessors （CMP） have become the main stream microprocessor architecture, in CMP, the cache, especially the last level cache, is the critical part of its performance and becomes a focus of current research activities. CMP cache faces the conflicting requirements of satisfying both latency and capacity, and has to trade off between techniques that reduce off-chip and cross-chip misses. The private cache design minimizes the cache access latency but reduces the total effective cache capacity. The shared cache design maximizes the effective cache capacity but incurs long hit latency. In this paper, a CMP cache design （tradeoff cache between latency and capacity,TCLC） is proposed. TCLC is a private and shared hybrid design. TCLC can dynamically identify the cache blocks＇ shared type and optimize them respectively. The private type is optimized through migration policy, the shared read-only type is optimized through replication policy, and the shared read-write type is optimized through center placement policy. TCLC tries to make cache access latency close to private design, and effective cache capacity close to shared design, which can mitigate the impact of the wire delay and reduce the average memory access latency. The experiment results indicate that this proposal performs 13.7% better than a private cache and 12% better than a shared cache.

同期刊论文项目