针对处理高维度属性的大数据属性约减方法进行了研究,发现属性选择和子空间学习是属性约简的两种常见方法,其中属性选择具有很好的解释性,子空间学习的分类效果优于属性选择,而往往这两种方法是各自独立进行应用的。为此,综合这两种属性约简方法,设计出新的属性选择方法,即利用子空间学习的两种技术(即线性判别分析(LDA)和局部保持投影(LPP)),考虑数据的全局特性和局部特性,同时设置稀疏正则化因子实现属性选择。基于分类准确率、方差和变异系数等评价指标的实验结果表明,该算法相比其他算法,能更有效地选取判别属性,并能取得很好的分类效果。
Aimed at dimensionality reduction method for processing high-dimensional features of big data to research, and found that feature selection and subspace learning are two traditional methods of dimensionality reduction. Where feature selection contains interpretable characteristics while subspace learning shows better classification performance than the former. And it often applied these two methods independently. This paper proposed a novel feature selection method by integrating subspacelearning(i. e. ,via using LDA and LPP,respectively, for preserving the global structures and the local structure of data) withfeature selection ( i. e. ,via a sparse regularization term ). Experimental results based on classification accuracy, variance and coefficient of variation as comparative evaluations, show that this algorithm compared to other algorithms, is more effectively to select discriminating property and can achieve good classification results.