超长指令字(Very Long Instruction Word,Vuw)处理器一般采用总线互连的多簇结构,每个簇中的功能单元共享一个本地寄存器堆,簇间采用总线传输数据,以避免功能单元增多时,全连通结构的延时、面积和功耗的快速增长;但簇间数据共享时的拷贝和延时,使得处理器在性能上有所下降.文中提出了一种寄存器堆互连的多簇VLIW结构,采用寄存器堆来连接各个簇,从而可以避免簇间数据传输的延时和额外的数据拷贝操作.同时也提出了针对这种结构的指令调度算法,以提高指令调度的性能.实验结果表明,与全连通的VLIW结构相比,寄存器堆互连结构在性能上仅有13%左右的性能下降,代码长度则基本不变;这都优于总线互连的多簇结构.
Generally VLIW(Very Long Instruction Word) processors are implemented as busconnectivity clustered architecture, in which the function units in a cluster only access the corresponding local registers and different clusters are connected by buses. This architecture can avoid aggressive growing of delay, area and power in full-connectivity VLIW processors when function units increase. However, performance degradation is induced by its copy operations and latency of communications between clusters. This paper presents a new clustered architecture, in which a register file is used to connect all the clusters so as to turn copy and latency away. This paper also gives instruction scheduling algorithm to improve the performance. The experimental results in- dicate that this new architecture under the help of this scheduling algorithm shows only 13% performance degradation and little code size increase in average compared with those of fully tivity VLIW architecture, which prevails that of bus-connectivity clustered VLIW archite connec cture.