如何编程众核体系结构是当前一个亟待解决的问题.研究可扩展的硬件机制支持Cilk编程模型的目的是在良好的编程性和可扩展硬件实现之间达到平衡.Cilk语言是C的精简扩展,程序员编写Cilk程序时和串行编程近似,且不需关心调度、负载均衡和局部性等系统底层相关的问题.文中以域一致性存储模型为基础,主要工作包括两方面:首先针对域一致性模型编程性不好的缺点提出一种以数据为中心维护高速缓存一致性的方法;其次提出实现DAG Consistency的缓存一致性协议,并在此基础上支持Cilk编程模型.实验结果表明,当处理器核数目较少(〈16)时所有测试程序都能获得比较好的性能加速,并且指出了众核情况下(〉16)难以获得理想加速效果的两个根本原因:静态路由导致片上网络带宽利用不均衡以及有限的访存带宽.
How to program many-core architectures is a critical issue.How to implement programming on many-core architecture becomes a critical issue currently.This paper studies scalable hardware mechanisms to support Cilk programming model,in order to achieve a good balance between programmability and scalable hardware implementation.The Cilk language is a simple extension to C,and writing Cilk programs is similar with sequential programming.In addition,programmers need not worry about system dependent issues,such as scheduling,load balancing and data locality,etc.This work is based on scope consistency.First a data centric approach is proposed for improving the programmability of scope consistency.Then architectural support for DAG consistency is proposed.Experimental results on a set of scientific benchmark programs show good performance speed up for a small number of cores.Experimental results also reveal two fundamental reasons which limit the performance scalability of Cilk computations on many-core architectures: the unbalanced on chip network bandwidth usage and limited memory bandwidth.