针对既有基于稳定存储的机群服务检查点存在的系统成本高、恢复时间长的问题,提出了一种基于共享内存的机群服务检查点机制;设计了一套面向基于共享内存的检查点信息主一备存储模式的检查点信息管理协议,确保机群服务检查点信息一致性;设计了一套基于单向逻辑环的检查点组管理协议,确保检查点逻辑备份环中检查点进程的成员视图一致性.性能实验结果表明,该检查点机制具有较好的检查点信息读写性能,组管理协议系统开销小,较好地满足了机群服务检查点需求.
To overcome the defects of the relative low performance cost ratio caused by the secondary storage-based checkpointing for cluster services, a shared memory-based checkpointing mechanism for cluster services is presented in this paper. Idea of the proposed cheekpointing mechanism is to make the checkpointing based on the shared memory, so as to reduce the checkpointing and recovery latency compared with the secondary storage-based checkpointing. To lower the risk of the non-persistent storage with the shared memory, in the shared memory-based checkpointing mechanism, all checkpoint servers in the cluster are organized as a single-directed circle. For each cluster service, the checkpoint data is stored both on the local checkpoint server and its predecessor in the single-directed checkpoint circle. The checkpoint management protocol is designed for the dual-stored checkpoint data to ensure the checkpointing update consistency. A group membership protocol is presented to guarantee all members in the single-directed checkpoint circle having the consistent group view, so as to backup the checkpoint data correctly. The experiment results show that the shared memory-based checkpointing mechanism achieves lower checkpointing and recovery latency. The group membership protocol needs only one-round communication to achieve the group view consistency among all checkpoint servers, hence costing low communication overhead.