随着CMOS工艺进入纳米时代,工艺尺寸的不断缩小增加了集成电路对瞬态故障与永久故障的敏感性.在片上网络中提供容错支持对于提高单芯片多处理器片上数据传输的可靠性至关重要.为了处理片上网络中的瞬态故障与永久故障链路,提出一种可配置双向链路的容错偏转路由器BiFTDR.相邻BiFTDR路由器之间采用一对可配置方向的双向链路互连,根据链路的故障状态和路由器的到达包信息对双向链路的方向进行动态配置,在单向链路故障的情况下不需要绕道路由即可实现容错,并且不需要路由表从而降低了路由器的硬件实现开销.模拟结果表明,在合成通信模式下,网络中包含5条和15条永久故障链路的情况下,BiFTDR路由器的包平均延迟比一种基于强化学习的容错偏转路由器分别少lO%和19%;在真实应用运行踪迹通信模式下,与无故障网络的包平均延迟相比,BiFTDR路由器的性能损失不到1%.对于瞬态故障,即使在高故障率下BiFTDR路由器的性能下降程度也较小.在65nm工艺下对BiFTDR路由器进行综合,能达到500MHz的时钟频率,并且具有较小的面积和功耗开销.
With the CMOS technology scaling down to the nanometer domain, continuing decrease in the feature size of integrated circuits leads to the increase in susceptibility to transient and permanent faults. Supporting fault-tolerance in NoC is highly important for the reliable data transmission on chip-multiprocessors. A fault-tolerant deflection router with reconfigurable bidirectional link for NoC (called BiFTDR) is proposed to protect against transient and permanent faulty links. A pair of reconfigurable bidirectional links connect two neighboring BiFTDR routers. The direction of the bidirectional links can be reconfigured dynamically according to the fault status of the link and the information of the arriving packets. The BiFTDR router can achieve fault-tolerance without misrouting in the case of unidirectional link faults. In addition, the router does not need the routing table, which can reduce the hardware overhead significantly. Simulation results illustrate that in synthetic traffic patterns, the BiFTDR router achieves 10Y00 and 19% less average latency than a reinforcement-learning-based fault-tolerant deflection router under 5 and 15 permanent faulty links respectively. In the real application traffic workloads, compared with the average latency of the network without faulty links, the performance degradation of the BiFTDR router is less than 1 G. For transient faults, the performance of the BiFTDR router can achieve graceful degradation even at a high fault rate. The BiFTDR router is synthesized in 65nm technology, and is shown to achieve the frequency of 500MHz with smaller area and power consumption overhead.