由于求解问题和系统规模的不断扩大,基于cluster架构的高性能计算机面临扩展性、可靠性、功耗、占地面积、均衡性等诸多挑战。该文针对计算模块、交换管理模块、自适应功率管理、专用FPGA硬件加速部件、高速PCI-E全交换扩展等方面,设计并实现高效能计算节点。基于该节点构建的曙光5000A百万亿次计算机能有效解决计算密度、I/O扩展及带宽瓶颈和能耗等方面的瓶颈。
As for the scale of the problem and the system continues to expand, the cluster-based high-performance computer is facing scalability, reliability, power consumption, footprint, balance, and many other challenges. This paper introduces the design and realization of high-productivity computing node such as computing module, switch module, management module, adaptive power management, FPGA-based hardware accelerator board, high-speed PCI-E switch extend module and other aspects. It resolves the computing density, I/O expansion and bandwidth bottleneck as well as energy consumption and other bottlenecks in Dawning 5000A 100 Teraflops supercomputer based on the high-productivity computing node.