针对网格计算可靠性需求提出了一种自适应的网格错误检测框架,该错误检测框架包括两个重要算法:单进程间错误检测算法以及错误检测器管理算法.该错误检测框架借鉴分布式不可靠错误检测服务研究以及关系型网格监控架构的思想,按照层次是方式组织错误检测服务.错误检测框架能够根据系统的运行状况以及用户需求动态调整系统参数以及系统部署结构,最后给出了系统的性能数学分析以及实验评测,结果显示系统具有良好的可扩展性和使用灵活性.
To improve the reliability of the grid, an adaptive failure detection framework is presented in this paper. There are two important algorithms in this framework, one is the algorithm about failure detection between two processes, and the other is the algorithm for failure detectors management. This framework incorporates the technique of unreliable failure detection service and the idea of R-GMA. The framework is organized in a hierarchical structure, and it can be adaptive to the system conditions and user requirements with changing the system parameters and system organizations. With mathematical evaluation and experimental evaluation, the framework shows good scalability and flexibility.