东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

EOFDM：一种面向众核架构的最低能耗搜索方法

ISSN号：1000-1239
期刊名称：计算机研究与发展
时间：2015.6.1
页码：1303-1315
分类：TP303[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
作者机构：[1]中国科学院计算技术研究所,北京100190, [2]中国科学院大学,北京100049
相关基金：国家“九七三”重点基础研究发展计划基金项目（2011CB302501）; 国家“八六三”高技术研究发展计划基金项目（2015AA011204,2012AA010901）; “核高基”国家科技重大专项基金项目（2013ZX0102-8001-001-001）; 国家自然科学基金重点项目（61332009,61173007）
相关项目：面向功能ECO的不等价逻辑抽取方法研究

关键词：处理器微结构, 指令缓存, 数据流, 指令重命名, 数据流局部性, processor microarchitecture, instruction cache（ICache）, dataflow, instruction renaming, dataflow locality

中文摘要：

为了能够同时发掘程序的线程级并行性和指令级并行性,动态多核技术通过将数个小核重构为一个较强的虚拟核来适应程序多样的需求.通常这种虚拟核性能弱于占有等量芯片资源的原生核,一个重要的原因就是取指、译码和重命名等流水线的前端各阶段具有串行处理的特征较难经重构后协同工作.为解决此问题,提出了新的前端结构——数据流缓存,并给出与之配合的向量重命名机制.数据流缓存利用程序的数据流局部性,存储并重用指令基本块内的数据依赖等信息.处理器核利用数据流缓存能更好地发掘程序的指令级并行性并降低分支预测错误的惩罚,而动态多核技术中的虚拟核通过使用数据流缓存旁路传统的流水线前端各阶段,其前端难协同工作的问题得以解决.对SPEC CPU2006中程序的实验证明了数据流缓存能够以有限代价覆盖大部分程序超过90%的动态指令,然后分析了添加数据流缓存对流水线性能的影响.实验证明,在前端宽度为4条指令、指令窗口容量为512的配置下,采用数据流缓存的虚拟核性能平均提升9.4%,某些程序性能提升高达28%.

英文摘要：

In order to exploit both thread-level parallelism（TLP）and instruction-level parallelism（ILP）of programs,dynamic multi-core technique can reconfigure multiple small cores to a more powerful virtual core.Usually a virtual core is weaker than a native core with equivalent chip resource.One important reason is that the fetch,decode and rename frontend stages are hard to cooperate after reconfiguration because of their serialized processing nature.To solve this problem,we propose a new frontend design called the dataflow cache with a corresponding vector renaming（VR）mechanism.By caching and reusing the data dependencies and other information of the instruction basicblock,the dataflow cache exploits the dataflow locality of programs.Firstly,the processor core can exploit better instruction-level parallelism and lower branch misprediction penalty with dataflow cache;Secondly,the virtual core in dynamic multi-core can solve its frontend problem by using dataflow cache to bypass the traditional frontend stages.By experimenting on the SPEC CPU2006 programs,we prove that dataflow cache can cover 90% of the dynamic instructions with limited cost.Then,we analyze the performance effect of adding the dataflow cache to pipeline.At last,experiments show that with a frontend of 4-instruction wide and an instruction window of 512-entry,the performance of the virtual core with dataflow cache is improved up to 9.4%in average with a 28% maximum for some programs.

同期刊论文项目