在KD-50-Ⅰ平台上,基于常用优化技术,根据龙芯2F体系结构的特点,在数据预取、指令调度方面,针对高性能计算机系统中能有效解决线性代数问题的子程序集合BLAS,提出了新的优化技术,充分发挥龙芯2F处理器的性能,实现了高性能的BLAS.实际测试表明,高性能BLAS在750MHz的龙芯2F处理器(双精度浮点峰值3Gflops)上HPL实测峰值达到1.47GHz,比原始BLAS提高了6倍以上,比ATLAS提高了45%.
BLAS are standard operations to efficiently solve the linear algebra problems on high performance computers. Some new optimization technologies on data prefetch and instruction scheduling developed specifically for Loongson 2F characteristics were proposed based on normal optimization technologies to give full play to develop the performance of Loongson 2F processer and implement a high performance BLAS on KD-50-Ⅰ platform. According to the experiments, the actual double float operation peak of high performance BLAS on 750 MHz Loongson 2F processor(double float peak 3 Gflops) can reach 1.47 GHz, which is more than 6 times higher than BLAS, and 45% higher than ATLAS.