针对脚本样本集具有混淆、统计、语义等不同层面上的特征,设计基于多类特征的JavaScript恶意脚本检测算法,实现针对恶意JavaScript脚本的离线分析系统JCAD.首先提取脚本的混淆特征,使用C4.5决策树分析被混淆的脚本并解除混淆.然后提取脚本的静态统计特征,根据语义进行脚本序列化,构造危险序列树,提取脚本的危险序列特征.最后以三类特征作为输入,采用对脚本样本集的非均匀性与不断增加的特点具有较强适应能力的概率神经网络构造分类器,判断恶意脚本.实验表明,该算法具有较好的检测准确率与稳定性.
Aiming at features of different levels in the script sample set, such as obfuscation, statistics and semantics, a malicious JavaScript script detection algorithm based on multi-class feature is proposed. The JavaScript analysis system, JavaScript codes analysis and detection, is implemented. The obfuscation features of the JavaScript are extracted and the obfuscated scripts are analyzed and deobfuscated by C4.5 algorithm. The static statistical features of the JavaScript are extracted, and according to the semantics, the JavaScript is serialized. Dangerous sequence tree is generated by the proposed algorithm to extract the dangerous sequence features of the malicious JavaScript. Three types of features are used as the input. The probabilistic neural network with strong ability to adapt to non-uniformity and the increasing quantity of the input samples is applied to construct the classifier for the detection of malicious JavaScript. The experimental results show that the proposed algorithm has better detection accuracy and stability.