针对向量编译的不足,提出一种基于循环展开的子字并行指令自动生成的方法.该方法利用传统的循环变换技术对多媒体应用中可以进行子字并行的循环进行优化,生成子字并行代码.首先识别出可并行的循环,然后通过循环展开、寄存器重命名、指令合并等技术来提升循环体基本块中的子字并行性.在TTA(trans-port triggered architecture)体系结构的编译框架下用该方法实现了子字并行指令的自动生成.实验表明该方法得到了较好的加速比.
Well-known parallelization techniques can be used to exploit subword parallelism. Loop unrolling, register renaming and induction variable expansion prove to be valuable to achieve this goal. We evaluated the performance of the code generated by our method for a number of benchmarks. The results reveal that our compiler produces a performance improvement over the code generated without the subword parallelism.