在含Cache的处理器中,代码排布和指令预取是减少取指延迟的常用技术.代码排布侧重研究代码执行的空间相对位置,指令预取则关注于代码执行的时间相对关系.片上Trace技术非入侵地获得程序的执行路径及时间信息,将代码执行的时空关系联系起来,因此为排布技术和预取技术的结合使用提供了基础.基于YHFT—DSP平台,利用程序运行的周期行为特性设置预取,利用VLIW结构处理器的空闲单元执行预取指令,提出以增加预取容限为目标的函数级代码排布方法.实验结果表明,该方法能有效预取并减少指令Cache失效.
Code layout and instruction prefetch are efficient methods to reduce delays on fetching instructions in processors with instruction cache. To reach this, Code layout adjusts relative space positions of execution codes, while instruction prefetch utilizes relative time relations of execution codes. There are seldom researches on combining them trace is a new debug technique that records the whole together to get better results Dy now. On-chip program path and time marks non-intrusively with special hardware. It connects space relations and time relations of the code execution, therefore it is possible to support the combination of code layout and instruction prefetch. On the platform of YHFT-DSP with instruction cache and on-chip trace systems, instructions are prefetched according to program phase behaviors taken from the program execution path, and function codes are reordered by prefeteh layout to achieve sufficient prefetch intervals. Prefetch operations are executed by idle function units in VLIW DSP or by NOP instructions to reduce overheads. Four benchmarks, Jpeg Encoder, float FFT, Lpc and MPEG4 encoder, are tested to evaluate the novel method. Test results show that this method is able to enhance the prefetching performance and reduce instruction cache misses by exploring the phase stability of program path.