并行应用在共享Cache结构的多核处理器执行时,会因为对共享Cache的冲突访问而产生性能下降和执行时间不确定的现象.共享Cache划分技术可以把共享Cache互斥地分配给多个进程使用,是解决该问题的有效方法.由于线程间的数据共享,线程数目不同的应用对共享Cache的利用率不同,但传统的以失效率最低为目标的共享Cache划分算法(例如UCP)没有区分应用线程数目的不同.文中设计了一种面向多线程多道程序的加权共享Cache划分框架(Weighted Cache Partitioning,WCP),包括面向应用的失效率监控器和加权Cache划分算法.失效率监控器以进程为单位动态监控在不同的Cache容量下应用的失效率;而加权Cache划分算法扩展了传统的失效率最优的Cache划分算法,根据应用线程数目的不同在进行Cache划分时给应用赋予不同的权值,以使具有更多线程的应用获得更多的共享Cache,从而提高系统的整体性能.实验结果表明:加权Cache划分算法虽然失效率有所增高,但却改进了IPC吞吐率、加权加速比和公平性.在由科学和工程计算应用组成的多道程序测试用例中,WCP-1的IPC吞吐率比以失效率最低为目标函数的共享Cache划分算法最高高出10.8%,平均高出5.5%.
In a chip-multiprocessor with a shared cache structure,the competing accesses from different applications degrade the system performance,resulting in non-optimal performance and non-predicting executing time.Cache partitioning techniques,a promising solution of the above problems,can exclusively partition the shared cache among multiple competing applications.Processes with different number of threads have different utility on shared cache.However,traditional cache partitioning mechanism,Utility-based Cache Partition(UCP) for example,is to lower the average miss rate of shared cache,regardless of the different thread number of different applications.In this paper,the authors design the framework of Weighted Cache Partitioning,a dynamic shared cache partitioning mechanism to improve the performance of multi-threaded multi-programmed workloads.The framework includes a miss rate monitor,called Application-oriented Miss Rate Monitor(AMRM),which dynamically collects miss rate information of multiple multi-threaded applications on different cache partitions,and weighted cache partitioning algorithm,which extends traditional miss rate oriented cache partition algorithms by adding power coefficient for applications based on their thread number.So the applications with more threads tend to get more shared cache in order to improve the overall system performance.Experiments show that although WCP has higher miss rate compared with miss rate oriented cache partition algorithm,it has better IPC throughput,weighted speedup and fairness.Specifically,for multi-threaded multi-programmed scientific computing workloads,WCP-1 improves throughput by up to 10.8% and on average 5.5% over miss rate oriented algorithm.