API调用序列一种常用的恶意软件检测与分类方法,但选择特征API调用时缺乏有效的方法.导致选取出来的特征中存在大量冗余,分类结果的精度不高。提出并实现了一种基于信息增益特征优化选择的恶意软件检测方法,在选择API作为分类特征时不仅考虑信息增益,还考虑API出现的频度、集中度,从而能够有效地选择对分类贡献较大的特征。实验结果表明,此方法具有较高的恶意软件检测率并能够保持对正常软件的低误报率。
API call sequence was a commonly method used into malware detection and classification, but there is a lack of efficacious ways to choose significant API sequences as detection features. Besides, redundancies exist among chosen features. All these problems lead to low detection accuracy. This paper come up with and implement a malware detection method which optimize the selection of features based on information gain. When choosing API sequences, not only the information gain but also the frequency and the degree of concentration of API are taken into consideration in our method in order to select features more contributive to classification. Our experiment has showed that this method has a high malware detection rate and can maintain low false alarm rate of normal software.