Intel Xeon Phi协处理器作为现阶段极具代表性的众核产品之一,为应用程序提供了强大的硬件环境和计算资源.然而,Xeon Phi所采用的内存设计存在高访问延迟的问题,因此非常依赖于缓存数据预取技术以提升访存性能.而Java作为一门具有自动内存管理且被广泛使用的语言,现有设计并未针对于Xeon Phi架构采取访存相关的优化.本文详细地研究了Xeon Phi上的缓存预取机制,并在Hot Spot虚拟机内部设计实现了一套动态的运行时缓存预取解决方案,该方案相比传统的静态方法和现有动态预取方案更适合于Xeon Phi众核架构及Java动态语言环境.本文通过实验表明,该动态预取方案在Xeon Phi众核平台上可以带来平均2.5倍的单线程加速比以及40%的多线程最优性能提升.
Intel Xeon Phi coprocessor, as one of the most representative many-core products, provides very powerful hardware support and computing resources. However, the memory design that Xeon Phi employs has rather high access latency, thus relies heavily on cache data prefetching techniques for performance improvement. Java, as a widely-used language with automatic memory manage- ment, currently does not have such memory-related optimizations for Xeon Phi architecture. In this paper, we perform a comprehensive study on Xeon Phi's prefetching mechanism and propose a dynamic data prefetching solution with an implementation inside HotSpot virtual machine. Compared to traditional static methods and current dynamic prefetching schemes,our solution is more suitable for Xe- on Phi many-core architecture and Java dynamic runtime environment. The evaluation results demonstrate that our solution could achieve an averagely 2.5x performance speedup and improve the best multi-threaded throughput by 40% on Xeon Phi.