提出了一个较灵活、可扩展的方法, 它是基于更细致的运行特征: API函数调用名、API函数的输入参数及两种特征的结合。抽取以上三类特征, 借助信息论中的熵, 定义了恶意代码信息增益值的概念, 并计算相应的API及其参数在区分恶意软件和良性软件时的信息增益值, 进而选择识别率高的特征以减少特征的数目从而减少分析时间。实验表明, 少量的特征选取和较高的识别率使得基于API函数与参数相结合的检测方法明显优于当前主流的基于API序列的识别算法。
This paper proposed a more flexible and scalable method, which was based on more detailed operation characteristics:API function call name, input parameters in API functions, the two types of the combination of features. It extracted three categories above, defined the concept of the information gain value of malicious code with the help of the entropy in information theory, then, calculated the information gain value of the corresponding API and its parameters in distinguishing the malware and begin software. And then selected the characteristic having higher recognition rate to reduce the number of features and analysis time. Experiment show that, a small amount of feature selection and higher accuracy makes it more superior to the algorithm of API based detection of malware.