大量的错误严重影响了超级计算机系统的稳定性,错误预测对于提高其稳定性有重要作用,日志分析是进行错误预测的有效方法。建立了错误预测的基本框架,包括日志的预处理、基础预测器和联合预测器,其中基础预测器包括时间预测器和关联预测器。在BlueGene/L日志上进行的实验结果显示联合预测器的预测效果比基础预测器好。这表明错误预测要充分挖掘错误的特性,将基于各种错误特性的基础预测器联合起来进行预测才能取得满意的预测效果。
Frequent failures had affected the stability of supercomputer system seriously,failure prediction was of great significance for improving the stability of the supercomputer,and log analysis was an effective way to predict failure.This paper established the basic framework of failure prediction,including log preprocessing,base predictor and joint predictor.And base predictor included time predictor and association predictor.The experiment on BlueGene/L's log shows that the prediction result of joint predictor is better than base predictor.The result indicates that failure prediction must adequately mine the characteristic of failures,and combining the base predictor based on the characteristic of failures can achieve satisfactory prediction result.