为了使应用线程更合理地映射到众核处理器具体处理核上,提出一种利用不同线程内部数据局部性及不同线程间数据相关性的特点、结合具体硬件架构特征的线程分组映射方法。通过计算数据重用距离,分析应用程序线程内部数据局部性,用线程相关性矩阵度量不同线程间的数据相关性;根据应用程序数据相关性及众核处理器硬件架构特点,通过设计数据相关性子树生成算法,将应用线程分为能反映不同线程数据访问特点的逻辑组;在线程逻辑分组的基础上,通过线程到处理核的绑定实现线程到具体处理器不同处理核硬件线程的合理映射。实验结果表明:与传统映射方法相比,该线程分组映射方法在不产生额外运行时开销的基础上,计算性能平均提高了14%,能耗降低了12%。该方法可以根据应用程序不同线程之间的数据相关性,将不同线程合理映射到具体众核处理器不同处理核上,在不引入额外运行时开销的基础上,提升众核系统的计算效能。
A grouping mapping mechanism of threads is proposed to reasonably map application threads to specific processing cores of a many-core processor according to the characteristics of applications. The mechanism bases on the data locality of intra-thread and the data correlation of inter-threads, and combines with the features of hardware architecture of many-core processor. The locality of intra-thread data is analyzed by computing the data reuse distance, and the correlation of inter-threads data is quantified by using a affinity matrix. Threads are divided into different logical groups by designing an algorithm to generate affinity spanning subtree. The reasonable mapping from application to core is realized by binding the thread to the processing core. Experimental results and a comparison with a traditional mapping mechanism show that the proposed mapping mechanism obtains nearly 14% improvement in computing performance and 12% reduction in energy consumption without introducing additional runtime overhead. The mechanism reasonably maps application threads to specific processing cores of many-core processors, and improves computing efficiency of many-core systems.