针对嵌入式应用系统中出现异常不能有效恢复、现有容错方法考虑不全面、扩展性差等不足,提出了一种基于通信的易扩展层次容错方法。结合模块化和松耦合思想,以软件守护进程为核心,以硬件、服务器监控为支撑,层次间采用成熟的通信技术建立联系,无缝整合了计算机硬件级、操作系统级、应用级、管理级的容错技术,以达到从整体上提高系统可靠性的目的;并详细介绍了软件守护进程与应用任务协调容错模块关键技术的设计和实现。
Concerning non-effective fault-recovery in embedded application system, the lack of comprehensive consideration about existing fault tolerance methods, and poor extensibility in, a kind of easily-extensible hierarchical fault-tolerance method based on the communication is put forward in this paper. At first, the method combines modularity and loose coupling, which seamlessly integrates the fault-tolerance design techniques of hardware level, operating system level, application level, and management level by establishing communication between core software daemon and supporting hardware and server. Then the key technologies of Multitask fault-tolerant were expounded in detail.