由于分布式应用的动态性、复杂性,传统的人工管理已经不能做到很好的故障管理,应用自主计算的思想实现管理成为一种解决问题的方法.研究基于故障诊断技术实现系统自感知.首先,根据对分布式应用故障管理的分析,提出一种混合故障诊断模型,将故障诊断的过程分为应用服务故障诊断和网络服务故障诊断2个阶段;其次,由于对网络故障症状的观察存在不确定和不准确的特点,将故障诊断模型映射到贝叶斯网络上进行不确定性推理;最后,重点研究了在多层FPM模型中进行推理的算法,给出一种基于变量消元算法的改进算法,实验证明改进算法可加速推理过程.
Fault management is a key research topic iff the field of distributed applications management. Due to the dynamic and complexity of distributed applications, traditional methods can't meet the need of the fault management. Autonomic computing becomes a solution to solve the problem in order to realize system's self-management. Basically, self-management is divided into two procedures: self-awareness and self-adapting. This paper mainly deals with actualizing system self- awareness based on fault diagnosis. Firstly, a hybrid fault diagnosis model is proposed after analyzing the fault propagation in distributed application management. According to this model, the fault diagnosis process is divided into two steps: application service fault diagnosis and network service fault diagnosis. Secondly, because the observation of the network faults is uncertain and inaccurate, fault diagnosis model is mapped to Bayesian network to carry out uncertainty reasoning. Finally, due to the complexity of the exact inference algorithm in Bayesian network, some improvements are added to the original inference algorithm for diagnosing the root cause based on multi-layers Bayesian network corresponding to multi-layers FPM model. As experiments shown, the improved algorithm accelerates inferring procedure.