属性选择是机器学习与模式识别中进行数据预处理的一个重要方法,特别是针对一些高维的数据集,其计算复杂度较高,对数据挖掘算法的性能影响较大.因此,文章在连续型萤火虫算法(GSO)基础上对萤火虫进行二进制编码,并结合修正后的sigmoid函数,提出一种基于二进制萤火虫算法的属性选择方法.该方法以数据集分形维数作为属性子集的评价准则,以二进制萤火虫算法作为搜索策略,通过对标准数据集UCI进行一系列实验,实验结果表明了该方法的有效性与可行性.
Attribute selection is an important method of data preprocessing in the fields of machine learning and pattern recognition. Especially, there are some high dimensional data sets which their computational complexity is so high that they greatly affect the performance of mining algorithm. Therefore, a new feature selectionmethod based on binary glowworm swarm optimization algorithm is proposed, which combines improved sigmoid function with the thought of fractal dimension. In this method, fractal dimension is taken as the evaluation criteria for attribution subsets and binary glowworm swarm optimization algorithm as a kind of search strategy. To verify the feasibility and effectiveness of the proposed method, UCI datasets are used in the experiments.