提出了一种通过监控多主体系统中各主体之间的消息通信定位故障主体的故障诊断方法。对于在多主体平台上运行的应用软件系统,首先对其功能进行建模,进而对其行为进行建模,然后在软件实际运行过程中捕获其中多个主体之间的通信消息及各类事件,通过设计的诊断算法与软件行为模型进行匹配和比较,通过推理发现运行过程中出现故障的主体及其某个异常行为,从而实现对软件系统的故障进行定位和诊断。在此基础上,实现了一个原型系统eHealer。实验表明,该方法能够准确定位多主体软件系统中agent级别的故障,与已有方法相比,具有实时性强、定位准确、普适性好的优点,为进一步选择故障恢复策略提供了有效的信息。
The paper proposes a method that can diagnose faulty agents in multi-agent systems (MAS) by monitoring communication messages among agents. For the application software running on a MAS platform, the method is descnbed as below: firstly, establishing its ftmction model, which helps to construct its behavior model, secondly, catching the communication messages among agents and other events during its practical running process, thirdly, comparing the message sequence with the behavior model by the consistency-based algorithm and reasoning to find the faulty agent and its faulty behavior. On this basis, eHealer, prototype system for sault diagnosis, is realized. The experiment shows that this method can locate those faults on agent level in MAS software. It has real-time response, accurate location and strong adaptation, which provide effective information for selection of recovery strategies.