网络数据流量的增大对入侵检测系统的实时性提出了更高的要求,压缩训练数据可加快未知样本的分类处理速度。针对数据量过大造成压缩处理和聚类效率低下的难题,提出了一种改进的自适应AP(affinity propagation)聚类方法,采取直接关联与簇中心距离较近样本的方法,减少聚类样本数量,降低聚类时空消耗,并依据关联结果,不断调整聚类参数,精确聚类结果。2个网络安全数据集的应用结果表明,该方法可从大规模样本中有效聚出代表性子集,在保证准确率的前提下,提高入侵检测的实效性。
The massive traffic of network data flow deteriorates the real-time performance for intrusion detection system, therefore compressing the train data can speed up the efficiency on unknown sample classification. As to improve the speed of data compressing and clustering on large volume of dataset, an improved adaptive affinity propagation method is proposed. The samples closer to the cluster center are directly linked to the center without clustering, which can sharply reduce the final cluster amounts as well as the full cost of clustering. Then the clustering parameters can be continuously adjusted depending on the cluster associations to refine the clustering result. Application analysis results on two datasets of intrusion detection demonstrate that the proposed method can identify the representative samples from the initial large amount of data, and speed up the efficiency of detection without reducing the model accuracy.