为了能够同时发掘程序的线程级并行性和指令级并行性,动态多核技术通过将数个小核重构为一个较强的虚拟核来适应程序多样的需求.通常这种虚拟核性能弱于占有等量芯片资源的原生核,一个重要的原因就是取指、译码和重命名等流水线的前端各阶段具有串行处理的特征较难经重构后协同工作.为解决此问题,提出了新的前端结构——数据流缓存,并给出与之配合的向量重命名机制.数据流缓存利用程序的数据流局部性,存储并重用指令基本块内的数据依赖等信息.处理器核利用数据流缓存能更好地发掘程序的指令级并行性并降低分支预测错误的惩罚,而动态多核技术中的虚拟核通过使用数据流缓存旁路传统的流水线前端各阶段,其前端难协同工作的问题得以解决.对SPEC CPU2006中程序的实验证明了数据流缓存能够以有限代价覆盖大部分程序超过90%的动态指令,然后分析了添加数据流缓存对流水线性能的影响.实验证明,在前端宽度为4条指令、指令窗口容量为512的配置下,采用数据流缓存的虚拟核性能平均提升9.4%,某些程序性能提升高达28%.
In order to exploit both thread-level parallelism(TLP)and instruction-level parallelism(ILP)of programs,dynamic multi-core technique can reconfigure multiple small cores to a more powerful virtual core.Usually a virtual core is weaker than a native core with equivalent chip resource.One important reason is that the fetch,decode and rename frontend stages are hard to cooperate after reconfiguration because of their serialized processing nature.To solve this problem,we propose a new frontend design called the dataflow cache with a corresponding vector renaming(VR)mechanism.By caching and reusing the data dependencies and other information of the instruction basicblock,the dataflow cache exploits the dataflow locality of programs.Firstly,the processor core can exploit better instruction-level parallelism and lower branch misprediction penalty with dataflow cache;Secondly,the virtual core in dynamic multi-core can solve its frontend problem by using dataflow cache to bypass the traditional frontend stages.By experimenting on the SPEC CPU2006 programs,we prove that dataflow cache can cover 90% of the dynamic instructions with limited cost.Then,we analyze the performance effect of adding the dataflow cache to pipeline.At last,experiments show that with a frontend of 4-instruction wide and an instruction window of 512-entry,the performance of the virtual core with dataflow cache is improved up to 9.4%in average with a 28% maximum for some programs.