单指令流多数据流(SIMD)是实现数据级并行的有效方法,但访问地址非对齐的数据严重影响程序的向量化,造成处理器性能下降。为降低非对齐访存延时,对高性能应用程序的访存结构进行建模,设计并实现SIMD分离缓冲行非对齐访存结构与双体cache非对齐访存结构。实验结果表明,在双体cache非对齐访存结构下,通过两数组相加与SIMD向量化实现的非对齐访存代码可达到对齐访存代码性能的99%,提高了SIMD向量化的访存效率。
Single Instruction Multiple Data (SIMD) is an effective approach to realize data level parallelism, but accessing unaligned data seriously affects vectorization of the program and causes processor performance degradation. In order to reduce the latency of unaligned memory access, the memory access structure of high-performance application programs is modeled. SIMD unaligned memory access structure which buffer line is splited and the memory unaligned memory access structure of dual cache are designed and implemented. Under memory unaligned memory access structure of dual cache, experimental results show that for addition of two arrays and SIMD vectorization, the performance of unaligned code is 99% of aligned code. The memory access efficiency of SIMD vectorization is improved.