针对分类特征数据给出一种新的特征重要性程度度量方法.以一趟聚类算法为基础,提出一种无监督特征选择方法.理论分析表明该方法时间复杂度与数据集的大小和特征个数成近似线性关系,适合于大规模数据集中的特征选择.在UC I数据集上的实验结果表明,与文献中的经典方法相比,本文方法具有较好的性能,说明提出的特征选择方法是有效可行的.
In this paper,a new definition of measuring the importance of features is proposed for categorical data.Furthermore an unsupervised feature selection method based on one-pass clustering algorithm is presented.Theory analysis indicates that the time complexity of the feature selection method is nearly linear with the size and the number of features of dataset.It can be applied in feature selection for high dimensional data.Experimental results on UCI datasets show that the performance obtained by the proposed method is effective and practicable in features selection through comparing with other traditional feature selection approaches.