现有过滤型特征选择算法并未考虑非线性数据的内在结构,从而分类准确率远远低于封装型算法,对此提出一种基于再生核希尔伯特空间映射的高维数据特征选择算法。首先基于分支定界法建立搜索树,并对其进行搜索;然后基于再生核希尔伯特空间映射分析非线性数据的内部结构;最后根据数据集的内部结构选择最优的距离计算方法。对比仿真实验结果表明,该方法与封装型特征选择算法具有接近的分类准确率,同时在计算效率上具有明显的优势,适用于大数据分析。
The existing filter feature selection algorithms do not consider the inner structure of nonlinear data, lead to a lower classification accuracy than wrapper feature selection methods. This paper proposed a reproducing kernel Hilbert space mapping based feature selection algorithm to solve that shortcoming of filter feature selection algorithms. Firstly, it constructed the search tree based on branch and bound method and searched. Then, based on the reproducing kernel Hilbert space mapping, it analyzed the inner structure of nonlinear data. Lastly, based on the inner structure of the data, it selected the optimal distance computing method. Compared simulation experiments results show that the proposal has a similar classification accuracy with wrapper feature selection algorithms, at the same time has obviously better computational efficiency, and can handle the big data analysis.