属性选择通常作为一个主要的预处理步骤,在机器学习和数据挖掘领域有着广泛的应用。选择出能够表征数据集分形特征的属性子集,对研究数据集的分形规律具有重要的价值。根据数据集的分形特征,引入了密度分析方法,指出了当前基于分形维数的属性选择方法的不足,提出了一种基于分形和邻接空间密度变化的属性选择方法。为了分析实验结果的有效性,利用SVM分类算法和K—fold交叉验证相结合的方法对3个数据集属性选择前后的分类性能进行了测试。实验证明该方法在属性选择方面有较好的性能,能够得到较优的属性子集。
Feature selection has abroad application in machine learning and data mining area,it is always applied as a primary pre-processing step.Selecting feature space which can stand for data set's fractal characteristics has an important value in revealing the law of data set.Basing on the future of fractal,this paper introduces the density analysis method and points out the defects of existing feature selection method based on fractal dimension.Then a feature selection method based on fraetal and changes of neighborhood space density isproposed.In order to evaluate the efficiency of this algorithm,the SVM algorithm and K-fold cross validation are used to evaluate the classification accuracy on three datasets.Experimental results show that this method can achieve a good performance compared with the existing methods,and can identify the better feature space.