高维数据中许多特征之间互不相关或冗余,这给传统的学习算法带来了巨大的挑战。为了解决该问题,特征选择应运而生。与此同时,许多实际问题中数据存在多个视图而且数据的标签难以获取,多视图学习和半监督学习成为机器学习中的热点问题。本文研究怎样从"部分标签"的多视图数据中选择最大相关最小冗余的特征子集,提出一种基于多视图的半监督特征选择方法。为了剔除冗余和无关的特征,探索蕴含于多视图数据中的互补信息以及每个视图中不同特征之间的冗余关系,并利用少量标签数据蕴含的信息协同未标签数据同时进行特征选择。实验结果验证了本算法能够获得很好的特征选择效果及聚类效果。
Lots of features in high-dimensional data are redundant or irrelevant.To tackle this problem,the concept of feature selection is introduced.In the meantime,many problems in machine learning involve examples that are naturally comprised of multiple views and with a limited number of labels.Multiview learning and semi-supervised learning become the hotspots in machine learning.Hence authors investigate how to select relevant features with minimum redundancy from multi-view data with a limited number of labels,and propose a semi-supervised feature selection and clustering framework.To remove redundant and irrelevant features,authors exploit relations among views and relations among features in each view,and use a limited number of labeled data to help feature selection.The proposed framework in multi-view datasets is systematically evalated,and the results demonstrate the effectiveness and potential of the proposed method.