故障诊断一致性(fault diagnosis agreement,FDA)是高可靠容错分布式系统的性能和完整性的重要保障.目前,大部分FDA协议还是只考虑单一故障组件的简单网络,而对于实际的分布式应用、故障节点和故障链路并存的系统假设更加有意义.但是,在此假设下,对恶意(拜占庭故障)组件的诊断是不可能满足FDA的.为此,首先提出了一种无效链路(invalid link)故障模型,可以更加准确地描述恶意组件的故障行为对系统的影响,有效提高故障诊断的覆盖率.在此模型基础上,提出了一个基于证据的故障诊断协议——PLFDA,可以同时对恶意节点和恶意链路进行检测和定位,并且能够满足故障诊断一致性要求.
Abstract Fault diagnosis agreement (FDA) can maintain the performance and integrity of highly reliable distributed systems. However, most of previous FDA protocols only take into account simple network with single faulty component. It is more important to study complicated network with faulty nodes and faulty links for real distributed applications. Unfortunately, the diagnosis of malicious (Byzantine) fault components can not satisfy FDA in this situation because of the arbitrariness of its behavior. Thus, the model of invalid link is proposed firstly in this paper, which can more accurately describe the effect of malicious faulty component under network with dual faulty components, and improve fault diagnosis coverage. Afterwards, based on the invalid link model, an evidence-based fault diagnosis protocol, PLFDA, is presented. PLFDA collects the messages which have accumulated in a Byzantine agreement protocol as evidence and then diagnoses the set of faulty components by examining the collected evidences. It can not only detect and locate simultaneously both faulty nodes and faulty links, but also satisfy requirements of FDA in a synchronous fully connected network, where the number of allowable faulty components is not greater than [ n/2] - 1, of which the number of allowable faulty nodes is less than or equal to [ ( n - 1)/3 ]. In addition, the proof of correctness and complexity of PLFDA and experimental results are given in the end.