多标记学习研究的是一个对象同时具有多个标记的一类复杂问题.文本标注、视频内容标注、图像识别和蛋白质功能的发现等都属于这类任务.与单标记学习问题一样,多标记学习也遭遇到了数据维数大的挑战.针对多标记数据,目前已经设计出一些约简算法,但与单标记约简算法相比,方法数量有限且局限性大.随着大数据时代的到来,收集大量样本越来越容易,但标注收集到的全部样本不切实际.这给想要通过利用粗糙集模型来解决多标记学习问题的研究人员带来了三个挑战:数据维数更高、现有粗糙集的局限性和部分标记决策表的出现.为了解决这三个挑战,提出了面向多标记学习的局部粗糙集模型,并获得了一些有意思的性质.最后,通过利用局部粗糙集模型,设计了一个多标记的启发式约简算法,并在三个公开的多标记数据集上验证了算法的有效性.
Multi-label learning is a particular learning task where each object is associated with a set of concept labels at the same time compared with single label learning.And it has been paid more attention than before because it widely exist in real world.In text labeling,each document may be annotated with more than a single label,for example,a web page on economy belongs to several predefined topics such as Buffett and stock simultaneously;in automatic scene annotation,each scene may be annotated with topical words,for instance,an image showing a sea bear in arctic may be associated with several annotated words such as bear and ice simultaneously;in the research of functional proteomics,each protein may show multiple functions meanwhile.All these cases are multi-label learning tasks.Like single label learning,multi-label learning also suffers from curse of dimensionality.Attribute reduction improving performance of multi-label classifiers is an effective means to decrease the dimension of the data.There are a large number of attribute reduction methods for single label learning,but a few methods have been designed formulti-label learning.However the existing attribute reduction methods have high computation complexity.Particularly,in the context of big data,collecting a large number of data is easier and easier,however marking all the data is unrealistic.If we analyze the multi-label problem of limitable data sets with existing rough set models,we need to take three challenges into consideration:bigger dimension,limitation of existing rough sets and appearance of partial label decision table.While the semi-supervised multi-label learning is a new research direction.To address these challenges and further exploit the information of unlabeled samples,local rough sets for multi-label learning is introduced and some interesting properties are obtained.Finally,a heuristic reduction method is designed by applying local rough sets for multi-label learning.Its effectiveness is verified on three publicity datasets.