要深入了解网络中的各种应用过程,进而对这些应用进行自动分类、识别、跟踪和控制,首先就要获得代表这些应用会话过程的状态机.为此提出一种新的方法从采集的应用层数据中反推协议状态机.它采用基于差错纠正的文法推断方法,利用应用层协议交互过程中出现的标识符状态序列,逆向工程其协议状态机.为充分挖掘和发挥差错纠正的性能,提出了最佳路径匹配标准确定纠正路径,以及基于概率统计的异常入度区分及其剪枝的方法;通过去重的状态合并和相似行为意义的协议结构化简措施解决状态膨胀问题,从而获取最精简的协议状态机.通过在包含多种应用层协议的实际网络中的实验,验证了该方法的有效性.
To deeply understand procedures of various network applications, and to automatically classify, recognize, trace and control them, protocol state machine that represents the application sessions have to be obtained in advance. A novel approach is presented to reversely infer protocol state machine from collected application layer data. Protocol state machine is derived with a method of error-correcting grammatical inference based on the state sequences that appear in the application sessions. To richly mine and bring into play the performance of error-collecting, a criterion of best- matching path is presented to solve the difficulty of path selection during the error-correcting process. A method with regard to abnormal indegree discrimination and pruning on the basis of statistical probability is proposed. Moreover, negative example sets with similar tokens are adopted to reinforce the error-collecting performance. In order to solve the state expansion during the reconstruction of the state machine, a simplifying measure to obtain a compact protocol state machine that expresses the internal operating mechanism of the protocol accurately is used based on state merging with removal of the identical token and model reduction with a similar behavioral semantic. The experiments conducted in a real network, containing a number of real applications with several application layer protocols, validate this method.