为了解决高速IPv6网络流量测量,提出了一种基于数据包首部内容分析的流量抽样测量算法.算法将IPv6数据包首部内容进行关键字段的掩码匹配,通过Hash映射,利用判断Hash值是否属于抽样域来决定数据包的采集与否.其特点是利用信息熵理论,分析IPv6数据包首部,选择出熵值较大的字段,将其作为抽样算法掩码匹配的关键字段,这样就避免了对数据包首部内容的全抽样,在保证抽样样本随机性的前提下,有效地减少了运算量.实验结果表明,总体流量和抽样样本的数据包大小分布函数曲线十分吻合,验证了该算法的正确性.
Traffic sampling techniques are widely used for traffic measurements at a high link speed to prevent an exhaustion of resources and to limit the measurement costs. However, the challenge of an effective sampling method for IPv6-hased networks is as yet unmet. This paper proposes a traffic sampling measurement method to take the challenge. For ensuring randomness of sample, we use entropy as an evaluation tool to analyze the bit randomness of each byte in IPv6 packet headers, and conclude that the last one byte of the Payload Length field and byte numbers 8, 12, 14, 15 and 16 of the IPv6 source and destination address fields which have both unchangeability during forwarding and high bit entropy values. We estimate whether a packet is sampled based on a hash function computed over the selected bytes. Therefore, the entire packet header content is not taken into account in our sampling method. The advantages of the method are improved randomness of the sample and the runtime efficiency of the sampling algorithm. Finally, through experiments using real IPv6 traffic traces, we prove that the sampled traffic data can correctly reflect the packet size distribution of full packet trace.