恶意代码的多维度特征融合与深度处理,是恶意代码分类研究的一种发展趋势,也是恶意代码分类研究的一个难点问题。提出了一种适用于恶意代码分类的高维特征融合方法,对恶意代码的静态二进制文件和反汇编特征等进行提取,借鉴SimHash的局部敏感性思想,对多维特征进行融合分析和处理,最后基于典型的机器学习方法对融合后的特征向量进行学习训练。实验结果和分析表明,该方法能够适应于样本特征维度高而样本数量较少的恶意代码分类场景,而且能够提升分类学习的时间性能。
High-dimensional feature fusion and deep feature synthesis of malware features is new tendency and difficult problem of malware classification research. This paper presented a high-dimensional feature fusion method for malware classification. Firstly, it extracted features from both binary files and disassembly files using static analysis. Secondly, it analyzed and processed the high-dimensional feature vectors based on the SimHash method with the idea of locality-sensitive features. Final- ly, it trained and learned the fused feature vectors based on the classical machine learning method. Experimental results and analysis show that the proposed method is suitable for malware classification with high-dimensional features while only a small number of samples are available, and it can also improve the time performance of sample classification.