以传输触发体系结构(TTA)为基础,为支持大数运算扩展寄存器堆,增加模乘单元以加速模乘操作,提出一种ECC整体算法处理器TTA—EC.该处理器具有如下特点:(1)利用TTA工具链,可快速开发出基于TTA-EC的完整ECC公钥系统;(2)模乘单元将以基数为处理字长的高基数Montgomery算法与行共享流水结构相结合,具有良好的可扩展性;(3)流水单元实现矢量乘操作,并同时支持GF(p)和GF(2^n)双有限域;(4)通过调整总线宽度和流水单元个数,可满足不同性能/面积约束.在0.18μm 1P6MCMOS工艺下,其高性能和紧缩面积版本的规模分别为117.4K和40.6K,可分别在0.87ms和7.83ms内完成一次GF(p)或GF(2^n)上的192位EC标量乘运算,峰值功耗分别为242.1mW和28.5mW.
Implementing ECC whole algorithms in hardware has such advantages as more security, less communication bandwidth and more convenient in hardware/software co-design etc. A whole algorithm processor TTA-EC is presented in this paper, which is extended from transported triggered architecture (TTA) by coupling a modular multiplier and long integer registers. TTA-EC has the following characters. (I) ECC whole algorithms can be developed conveniently through the TTA tool chain; (II) the modular multiplier combines a radix-length based version of high radix Montgomery algorithm with a row sharing pipeline design to get high performance and scalability; (III) pipeline elements perform vector production and support Galois field GF(p) and GF(2^n); (IV) different performance/area constraint can be achieved by adjusting the bus width and the number of modular multiplierts pipeline elements. In a 0.18μm 1P6M CMOS technology, the high-speed design using 117. 4K gates achieves operation time of 0.87ms for a 192-bit elliptic curve scalar multiplication on GF(p)/GF(2^n) field, A compact version requires 40.6K gates and executes the operation in 7.83ms. And their peak powers are 242, lmW and 28.5mW separately.