近年来日志挖掘是一种广泛使用的检测应用状态异常的方法.现有的异常检测算法需要大量计算,或者它们的有效性依赖于测试日志满足一些预先定义的日志事件概率分布.因此,它们无法用于在线检测并且在假设不成立时会失效.为了解决这些问题,该文提出了一种新的异常检测算法CADM.CADM使用正常日志和待检测日志之间的相对熵作为异常程度的标识.为了计算相对熵,CADM充分利用了相对熵和文法压缩编码大小之间的关系而不是预先定义日志事件概率分布的族.通过这种方式,CADM避免了对日志分布的预先假设.除此之外,CADM的计算复杂度为O(n),因此在日志较大的情况下有较好的扩展性.通过在仿真的日志和公开日志集上的评测结果可以看出,CADM不仅可以应用在更广泛的程序日志上,也有更高的检测精度,因此更适合在线日志挖掘异常检测的工作.
Nowadays,mining program logs is a widely used technique for detecting anomalies in program states.Basically,existing anomaly detection methods require considerable computation efforts,or their effectiveness relies on some prior assumptions of the distribution holding on test logs.Therefore,they can hardly work online and cannot be used in all scenarios.To address the aforementioned problems,this paper proposed a new anomaly detection method called CADM.CADM exploited relative entropy between test logs and normal logs to measure the anomalous level.Instead of computing the relative entropy directly based on predefined distribution family,our method took advantage of the relationship between relative entropy and compression size by an adapted grammar-based compression method and eliminated such kind of assumptions.In addition,our method has only an O(n) computation complexity and scales well on large logs.Experiments with both synthetic logs and real world logs show that our method is more suitable for online log mining tasks since it has higher detection accuracy on broader variety of program logs.