为克服传统方法在特征提取上存在的缺陷,提出一种基于Lempel-Ziv-Welch(LZW)压缩算法的未知恶意代码检测方法.忽略未知恶意代码结构将其看成字符串流,依据事先确定的阈值限制抽取的字符串长度,以实现处理效率和性能间的折衷;将所抽取的字符串按照其类别建立符合统计特性的压缩字典,即正常代码和恶意代码字典;利用2个字典对待测文件进行压缩,得到不同的压缩率,依据最小描述长度原则将其归类为能取得最好压缩率的类别,达到检测未知恶意代码的目的.实验结果表明,基于LZW算法的检测方法对未知恶意代码具有较好的识别效果.
To overcome the shortcoming of traditional methods in feature extraction, unknown malicious codes detection based on the Lempel-Ziv-Welch (LZW) compression algorithm was proposed. The strings were extracted from file character flow. The length of strings was not over a thredhold. Then, compression dictionaries of normal code and malicious code were built by extracted strings. To detect unknown malicious codes, the normal code dictionary and malicious code dictionary were used to compress a tested file and two different compression ratios were obtained. According to the minimum description length (MDL) theory, the authors compared the two compression ratios and classified the tested file into the class in which got better compression ratio. Experimental results show that the method of unknown malicious code detection based on LZW compression algorithm has a good effect.