基于载荷特征的IP流分类技术的准确性较高,但是该分类技术的基础是提取出准确的载荷特征.目前大部分应用的载荷特征依靠手工逆向分析数据包结构来进行提取.然而手工分析提取应用产生的数据包是十分耗时的,特别是对于一个未知的应用.鉴于此,本文设计并实现了一种把固定位置载荷特征和载荷特征公共子串相结合的载荷特征自动提取算法.该算法可以自动提取应用层载荷特征并构造出正则表达式.除了可以提取出公共特征串之外,还可提取出很多特征提取算法所忽略的固定位置的单字节特征.实验结果验证了算法的有效性和准确性.
The classification of IP flow based on the payload signatures is quite accurate, but the basis of the classification method is extracting an accurate payload signature. At present the payload signatures of most applications are generated based on inverse analysis of the packet structures manually. However, analyzing the packets generated by applications manually is time-consuming, especially for an unknown application. Due to this, the paper devises and implements an algorithm for automatic generation of the payload sig- natures, which combine the fixed-position payload signatures with the common substrings of the signatures. This algorithm can gener- ate the payload signatures of the application layer and construct the regular expressions of the signatures automatically. In addition to generating the common signatures, this algorithm can also generate the fixed-position one-byte-signature which is usually ignored by many algorithms for generating payload signatures. The experimental results verify the effectiveness and accuracy of the proposed al- gorithm.