针对空中交通管制系统(ATC)中对飞行数据集群处理的可靠性要求,提出了一种基于Linux的用户级进程检查点设置与恢复方案。对基于该Linux用户级的进程检查点的飞行数据集群处理的各个主要模块进行了介绍,在此基础上给出了系统设计框架。从进程的初始化数据段、堆、栈和打开的文件的保存与恢复,给出了该方案的详细实现方法。该进程检查点设置与恢复方案不但可以在主机崩溃重启后恢复进程在重启前的运行状态,更重要的是可以在分布式系统通过进程迁移将保存的进程检查点迁移到其它主机运行,从而有效的提高系统的可靠性,减少运算损失。
For the requirement of high availability of air traffic control system(ATC) in the flight data processing cluster,a Linux-based user-level process checkpointing and recovery program is presented.First,the major modules of flight data proce-ssing clusters based on the Linux user-level process checkpoint were introduced.On this basis,the system design framework is given.And the detailed implementation of saving and recovery the initialized process data segment,heap,stack and open files is given.The process checkpointing and recovery scheme can not only recovery the running process after the restart the host,but also reconstruct the process in another host in a distributed system,thus will effectively improve system reliability,and reduce operational losses.