为了使循环在编译过程中更充分地被向量化,提出了一种基于代价分析的向量化循环分割技术。标记出了迭代依赖间隔中不存在依赖关系的循环片段,在此基础上建立一个简单有效的代价分析模型来评估这些循环片段向量化和未向量化的CPU时钟周期开销,最后从代价分析结果中确定是否需要将其向量化分割,从而把向量化特性应用到细短的循环片段。实验结果表明了该技术的有效性,对迭代依赖距离大的循环片段优化作用更明显。
To vectorize a loop more sufficiently in the compile time, a loop peeling technique for vectorization based on a cost analysis is presented. Firstly, the fragments that do not have dependency are marked in the loop. Then a simple and cost effective model is designed to evaluate the CPU clock cycles in the vectorized or non-vectorized loop fragments. Finally, the final results from the cost analysis determined whether to vectorize the fragments or not, thus the vectorization is applied to short loop fi'agments. The experimental results demonstrate effectiveness of the technique and show an obvious optimization in loops have a long dependence distance.