为了解决传统的入侵检测聚类算法准确率较低这个问题,结合半监督学习的思想,提出了一种面向入侵检测的半监督聚类算法。首先利用样本数据集中的部分标记数据,生成用于初始化聚类的种子集,通过计算样本数据集中标记点与每个类簇中标记点均值的欧氏距离,得到每类的初始聚类中心,实现了入侵检测数据的准确识别。该算法有效地避免了传统聚类算法中初始聚类中心选择的盲目性和随机性,提高了检测率。实验结果表明,在处理入侵检测数据时,该算法能够充分利用少量类标记信息进行半监督学习,较传统的K-means算法聚类效果更好,检测准确率更高。
The detection rate of the traditional intrusion detection clustering algorithm is low. We combined the idea of semi-supervised learning and proposed a semi-supervised clustering algorithm oriented intrusion detection in order to im- prove it. Based on the part of the labeled data in the sample dataset, we generated the Seed set for initializing the clus- ter. The accuracy recognition of the intrusion detection data was achieved by calculating the Euclidean distance between the labeled data in the sample dataset and the average value of labeled data in each cluster and getting the initial center point. The blindness and randomness of the traditional cluster algorithm were avoided when choosing the initial center point. Furthermore, the efficiency of the detection was also improved. Experimental results showed that the proposed algorithm could utilize less label information via semi-supervised learning, and could achieve a higher efficiency than the traditional K-means method when dealing with intrusion detection dataset.