高性能互联网络交换机是高性能计算机系统的核心部件.科学计算作为高性能计算机的上层应用,不仅要求交换机具有低延迟、高带宽的特性,还要求其在集合通信如广播、多播和同步操作等进行硬件级支持.HyperLink交换机,作为曙光5000计算机系统互联网络的重要组成部件,具有38.4ns单级延迟和160Gbps聚合带宽,并能够同时支持16组多播和16组同步操作.理想情况下,1024个节点多播和同步操作可以在2μs内完成,大大加速了科学计算的性能.为了对HyperLink交换机性能进行评价,建立了周期精确的仿真模型.通过模拟证明,对于16端口输入缓冲交换机,3个虚通道是性价比最好的选择;当MTU为1KB时,4KB大小的输入缓冲就可达到最高单播吞吐率.采用理论分析的方法比较了具有相同网络带宽的多轨网络和单轨网络,分析表明,前者可以有效降低网络延迟,因此能够比后者提供更高的网络吞吐率.采用LogP模型分析了HyperLink多播和Barrier的性能,分析表明,HyperLink交换机具有良好扩展性,能够很好支持到数千节点.
High performance interconnection network switch plays a critical role in high performance computing (HPC) systems. As upper layer applications of the HPC, scientific computations demand not only low latency and high bandwidth of switch, but also hardware support of collective communications, such as broadcast, multicast, and barrier, etc. HyperLink switch, the core component of Dawning 5000 interconnection networks, has 38. 4ns single stage latency and 160 Gbps aggregated bandwidth, furthermore it supports 16 multicast groups and 16 barrier groups simultaneously. In the ideal condition, 1024 nodes can finish multicast and barrier operations within 2μs, which greatly improves the performance of scientific application. A cycle-accurate switch model is also built to evaluate switch performances. The simulation proves that 3 virtual channels are the best performance-cost choice for 16-port input-buffered switch, and that 4 KB input buffer is sufficient for 1 KB MTU switch to achieve the highest unicast throughput. A comparison between multi-rail networks and single-rail networks which have the same bandwidth as multi-rail networks is also given in theoretical analyses. It is shown that the former could effectively minimize the network latency, and thus provides much higher network throughput than the latter. The LogP model is employed to evaluate HyperLink multicast and barrier performances, which shows that the HyperLink switch has good scalability, easily supporting up to thousands of nodes.