在处理入侵检测中的大规模数据时,冗余和不相关的特征数据长期造成网络数据流量分类问题,这种特征会降低分类效率和精度,并影响系统的实时检测率。该文提出了一种新的基于互信息的特征选择算法(NMIFS),该算法能处理线性和非线性相关的特征数据。在数据预处理的过程中,使用该算法选择出最优特征,然后结合常见的最小二乘支持向量机算法(LSSVM)对数据进行分类。采用入侵检测标准数据集KDD Cup 99对模型进行性能评估,对比其他新型的优化算法,结果表明NMIFS算法更有助于LSSVM算法实现更高的分类精度和效率,降低计算复杂度,同时提高模型的检测率。
When dealing with large-scale data in intrusion detection,redundant and irrelevant feature data caused a long-term problem in network traffic classification,these features not only reduce the accuracy and efficiency of classifier,but also impact the real-time detection rate of the system. A novel feature selection based on mutual information called NMIFS is proposed. The proposed algorithm can handle linearly and nonlinearly dependent feature data. In the process of data preprocessing,firstly,the algorithm is used to select the optimal feature,and then combined with the common least squares support vector machine( LSSVM) algorithm to data classification. In this paper,the performance of model is evaluated using the intrusion detection data sets KDD Cup 99,compared with the state-of-the-art method,the results show that the NMIFS algorithm is more helpful to achieve higher classification accuracy and efficiency of LSSVM algorithm,and reduce the computational complexity,while improving the detection rate of model.