针对使用目录记录各共享缓存块在各核心的私有备份信息的多核和众核并行系统共享高速缓存一致性协议因使用目录造成性能下降的问题进行了研究。研究发现,实际应用的多核和众核系统可以不存储共享缓存块的共享信息,因为多核和众核系统大都采用弱一致性协议,根据这种协议,某个核心的写操作不需要立即被其他核心观察到,可以延迟到下一个同步点观察到。基于这一发现,提出了一种不用记录共享信息的无目录的(DirectoryLess)共享高速缓存(Sharedcache)一致性协议,简称DLS协议。该协议通过在同步点对不确定是否被其他核心更改的缓存块主动无效的方法,在不需要存储共享信息的目录的情况下来保证多核系统符合弱一致性。用并行程序测试集SPLASH-2对一个16核处理器进行了试验,试验结果表明,相比基于目录的MESI协议,DLS不仅可以完全消除目录及其电路面积,而且可平均提高11.08%的程序性能,减少28.83%的片上网络通讯,以及减少15.65%的功耗。而这一切,只需要改变处理器的设计,并不需要改变编程语言和编译器,因此,该协议无需更改或重新编译即可以兼容现有的代码。
The directory caused performance decline of the shared cache coherence protocol using a directory to record each shared cache block' s private reserved information for multi/many core parallel systems was studied. The study discovered that multi/many core systems in practical use need not to store the shared information of shared cache blocks because the systems mostly use a weak consistency protocol. According to the protocol, a core' s write-opeation need not to be immediately observed by other cores until the next synchronous point comes. Based on the discovery, a directoryLess shared cache coherence protocol needing not to record shared information, called DLS, was put forward. The DLS completely removes the directory and Invalidation/Ack messages, and efficiently maintains cache coherence by using a novel self-suspicion + speculative execution mechanism. The SPLASH-2 benchmark was used to test a 16-core processor, and the testing results show that the DLS not only completely removes the chip area cost of the directory, but also improves processor performance by 11.08 %, reduces the overall network traffic by 28.83% , and reduces the energy-consumption in network communication by 15.65% on average compared with the traditional MESI protocol with full directory. Moreover, the DLS does not involve any modification to programming languages and compilers, and hence is seamlessly compatible with legacy codes