针对传统基于聚类分析的网络流量异常检测方法准确性较低的问题,提出了一种基于改进k-means聚类的流量异常检测方法。通过对各类流量特征数据的预处理,使k-means算法能适用于枚举型数据检测,进而给出一种基于数值分布分析法的高维数据特征筛选方法,有效解决了维数过高导致的距离失效问题,并运用二分法优化K个聚簇的划分,减少了初始聚类中心选择对k-means算法结果的影响,进一步提高了算法的检测率。最后通过仿真实验验证了所提出算法的有效性。
To solve the problem that traditional traffic abnormal detection methods were not accurate enough, a traffic anomaly detection method based on improved k-means was proposed. All kinds of network traffic data were preprocessed to make k-means algorithm can apply to enumeration data detection. Then a features selection method was proposed with the analysis of the distribution of network traffic data to avoid the distance useless caused by too much features. Furthermore, the clustering process of K clusters was optimized based on dichotomy, aiming to reduce the effects of initial clusters centers selection. Simulation results demonstrate the effectiveness of the algorithm.