入侵检测数据集具有数据量大、特征敷众多、连续型数据的特点.粗糙集是一种有效处理不确定性、不一致性、海量数据的有效分类工具,其特点是保持入侵检测数据集的分类能力不变,进行特征选择.为了避免传统粗糙集特征选择方法所必需的离散化过程带来的信息损失,引入邻域粗糙集模型,提出基于邻域关系的网络入侵检测数据特征选择方法.该方法从所有特征出发,根据特征重要度逐步删除冗余的特征,最后得到关键特征组进行分类研究.在CUP99入侵检测数据集上进行特征选择,并进行了分类实验,实验结果表明该方法是有效可行的.
Since there are many features in intrusion detection data, which is large and continuous, feature selection plays an important role in intrusion detection. Rough set theory is an efficient classification tool to deal with uncertain, inconsistent and large data. One limitation of rough set theory is the lack of effective methods for processing real valued data. However, intrusion detection data is always continuous. Discrete methods can result in information loss. This paper investigated an approach to intrusion detection feature selection based on neighborhood rough set theory. The approach starts from all the features to gradually remove the redundant features, and finally get the key features of group classification study based on characteristics important degree. To evaluate the performance of the proposed approach, we applied it to CUP99 intrusion detection data set and compared our results with traditional rough set feature selection. Experimental results show that our algorithm is more effective for selecting high discriminative feature in a classification task.