全序组播是构建分布式应用程序的一种重要组通信原语,它能够保证一个通信组中的所有成员都按照同样的顺序接收消息.目前的全序组播算法不能同时获取低延迟和高吞吐量,并且缺乏对应用程序通信模式的适应性,因此不适用于高性能计算环境.在分析已有算法排序机制基础上,指出影响全序组播算法性能的关键因素,并提出一种基于leader/followers模式和阻塞检测机制的新算法.算法工作原理如下:每一个组成员都可以在任意时刻发送消息,但只能提交来自当前leader成员的消息;一旦leader成员进入不活跃状态,则通过特殊的命令来指定某个活跃的follower成员为新的leader成员.模拟实验结果表明,该算法在延迟时间和吞吐量等性能指标方面都优于已有算法,同时在突发消息模式下能够大幅度提升性能.
Total order broadcast is an important group communication primitive for building fault-tolerant distributed applications, and it guarantees that all members in a communication group receive messages in the same order even if some members are faulty. The existing total order broadcast algorithms can not achieve both low latency and high throughput at the same time, and lack adaptability for the communication patterns of applications, and thus they are not suitable for high performance computing environments. In analyzed in this paper are the ordering mechanisms in some existing typical algorithms, and the key factors that affect the performance of total order broadcast algorithms are pointed out. Then a novel algorithm is proposed, which builds the total order using the leader/followers pattern and is driven by block detection mechanism. It works as follows: each group member can send messages at any time, but only messages from the current leader are delivered, and if the leader remains inactive, it will issue a special request to change the leadership to one of the active follower members. Simulation experiments are performed and the results show that the new algorithm achieves good performance in terms of both latency and throughput, and is much more efficient under bursty message arrival pattern.