为了获得未知报文的格式,提出了基于长度语义约束的报文格式挖掘方法,该方法建立在多序列比对方法的基础上,通过对报文片段之间及其内部迭代地使用长度字段扫描算法来推断报文中的长度字段及其指称字段(组),进而获得未知协议报文的层次结构.实验结果显示出新算法的有效性:以SNMPV1报文(GetNextRequest和GetResponse)为例,对长度字段挖掘的漏报率为9.1%,误报率分别为16.7%和23.1%,获得的报文结构与协议规范也基本一致.
In order to get the format of unknown protocols, a length semantic constraints based packet format mining method is proposed based on length semantic constraints. First, multiple sequence alignment method is applied to partition a packet into segments. Then, a length identification algorithm is utilized to scan the segments separately to infer length fields and corresponding referred field ( s). At last, the format (hierarchy structure) of the packets is obtained. Experiments demonstrate the effectiveness of this method : the false negative rates of length fields for GetNextRequest and GetResponse of simple network management protocol version 1 are both 9.1% , and the false positive rates are 16.7% and 23.1%. The packet hierarchy is also obtained, approximately consistent with protocol format specification.