实时攻击数据集含有缺失属性和大量非攻击样本,呈现属性分布不完全和类分布偏斜的特点,不利于聚类分析。针对此问题,提出了一种面向不完全攻击数据集的两阶段聚类算法。算法首先利用标准2-类支持向量机分离数据集中的非攻击样本,使类分布均衡。提出一种不完全样本间的距离度量方法,将该方法应用于最近邻间隔模糊C均值算法实现聚类。实验结果表明,与现有算法相比,提出的算法有效地提高了聚类准确率。
Due to including missing features and a large number of non-attack samples,real-time attack data set present incomplete feature distribution and skewed class distribution,which is adverse to clustering analysis. To solve this problem,a two-phase clustering algorithm for incomplete attack data set is proposed. Firstly,standard two-class support vector machine is used to separate non-attack samples and balance the class distribution. Secondly,a method of measuring the distance between incomplete samples is proposed. Then,this method is applied in the nearest-neighbor interval fuzzy C-means algorithm to implement clustering. Experimental results show that,this algorithm has better performance on clustering accuracy than existing algorithms.