针对启发式特征选择策略忽略了特征间相关信息导致子最优的问题,提出一种基于流形鉴别信息的特征选择(MDFS)算法.该算法根据近邻信息和标签信息刻画高维数据类内和类间流形结构,以最小化流形散度差为准则构建目标函数,并增加结构化稀疏正则项降低特征间冗余.通过统一框架下的特征权重迭代优化获得最优特征子集.在ORL库、COIL20库、Isolet1库上的聚类实验表明,MDFS算法选取的特征子集相比传统算法具有更高的识别准确率和归一化互信息,验证了所提出算法的有效性.
The traditional heuristic feature selection methods usually neglect the correlations between features, and thus lead to suboptimal feature subset. Therefore, a method of manifold discriminant feature selection(MDFS) is proposed. The method captures the manifold structure of the dataset by incorporating both neighbor and label information, and then the objective function can be formulated by minimizing the difference between intra and inter scatters. Besides, the structured sparse regularization term is further added to reduce the redundant information. Finally, a new iterative algorithm is presented for optimization. The experimental results on three popular datasets, i.e., ORL, COIL20, and Isoletl dataset, show that, compared with existing related methods, the proposed method achieves better clustering performances in terms of accuracy and normalized mutual information. Thus the effectiveness of the proposed method can be verified.