基于应用层载荷特征的IP流分类技术的准确性较高,但是,当特征库庞大时遍历匹配特征库需要消耗大量的时间.鉴于此,提出一种将应用层载荷特征和启发式搜索相结合的IP数据流分类方法.通过从各种应用产生的数据包之间提取共同特征并以此共同特征建立启发式规则,根据启发式规则将特征库划分为多个特征子集,在数据包匹配过程中只需要根据启发式规则搜索匹配特定的特征子集,从而大大减少了对无关特征的匹配过程,使待匹配的特征子集具有更强的针对性、使得时间性能得到提高.对于部分应用采用以DNS为引导的方法来对数据包进行分类,该方法部分消除了基于载荷无法对加密数据进行识别的弊端.本文用C语言实现了该算法,并与开源软件l7-filter算法进行了对比实验.实验结果表明:在离线状态下,本文提出的方法的分类速度是l7-filter分类速度的6-10倍,总体识别准确性达到98%以上.
The accuracy of IP flow classification based on the characteristics of the application layer is relatively high,but it will cost a lot of time to match the feature library when the feature library is huge.To solve this problem,this paper proposes an approach of traffic classification that combines the characteristics of the application layer with heuristic search.First,we extract the common features from the packets generated by a variety of applications to establish the heuristic rules.Second,we divide the feature library into several feature subsets according to heuristic rules.Then in the process of traffic classification,we only need to match a specific feature subset according to heuristic rules,so the matching of irrelevant features can be greatly reduced,the feature subset is more targeted to be matched and the time performance is improved.For some applications we use DNS as a guide in traffic classification,overcoming the drawback that the encrypted data can not be identified based on the characteristics of the application layer.This paper realizes the algorithm with C language and compares it with l7-filter.The experiments show that the offline classification speed of the method presented in this paper is as 6-10 times as l7-filter,and the accuracy of identifying traffic of various application in our method can reach more than 98%.