在多标记学习中,特征选择是解决多标记数据高维性的有效手段。每个标记对样本的可分性程度不同,这可能会为多标记学习提供一定的信息。基于这一假设,提出了一种基于标记权重的多标记特征选择算法。该算法首先利用样本在整个特征空间的分类间隔对标记进行加权,然后将特征在整个标记集合下对样本的可区分性作为特征权重,以此衡量特征对标记集合的重要性。最后,根据特征权重对特征进行降序排列,从而得到一组新的特征排序。在6个多标记数据集和4个评价指标上的实验结果表明,所提算法优于一些当前流行的多标记特征选择算法。
In multi-label learning,each sample is described as a feature vector and simultaneously associated with multiple class labels.Feature selection is able to remove irrelevant and redundant features,which is an efficient measure of overcoming the curse of dimensionality for multi-label data.Label has different separability with sample,which may provide some usefull informations for multi-label learning.Based on this assumption,a multi-label feature selection algorithm based on label weighting was proposed in this paper.First,the margin of sample in all feature space is calculated and it is used as label weighting.Then,the distinguishability of feature is adopted based on label set for calculating feature weighting,which will measure the importance degree of feature.Finally,all features are sorted by the value of feature weighting.Experiment was conducted on four multi-label datasets,and four evaluation criteria were used to measure the effectiveness of our method.Experimental results show that the proposed algorithm is superior to several stateof-the-art multi-label feature selection algorithms.