K-means聚类算法在入侵检测的运用中存在两个重要的缺陷:一是初始聚类中心是随机选择的,二是容易陷入局部最优解。提出一种改进的K-means算法,首先通过数据筛选确定高密度区域,然后确定两个最远点作为初始聚类中心以及非模糊型的集群评估指标来确定剩下的初始聚类中心,最后再进行聚类分析。实验表明,改进后的K-means算法不再依靠随机的K值和聚类中心,使得聚类过程可以依据数据集本身进行自适应的调整,同时保证了较高的网络入侵的检测率和较低的误报率。
There are two major flaws in the K-means clustering algorithm of intrusion detection:One is the initial cluster centers are randomly selected,the other is that it is easy to fall into local optimal solution.This paper proposes an improved K-means algorithm.First,determine the high-density region via data filtering,then determine the two farthest points as the initial cluster centers and non-fuzzy clustering model assessment index to determine the remaining initial cluster centers,finally, clustering analysis.