传统监督学习通常需使用大量有标记的数据样本作为训练例,而在很多现实问题中,人们虽能容易地获得大批数据样本,但为数据提供标记却需耗费很多人力物力.那么,在仅有少量有标记数据时,可否通过对大量未标记数据进行利用来提升学习性能呢?为此,半监督学习成为近十多年来机器学习的一大研究热点.基于分歧的半监督学习是该领域的主流范型之一,它通过使用多个学习器来对未标记数据进行利用,而学习器间的“分歧”对学习成效至关重要.本文将综述简介这方面的一些研究进展.
Traditional supervised learning generally requires a real tasks, however, although it is usually easy to acquire a lot large amount of labeled data as training examples; in many of data, it is often expensive to get the label information. Can we improve the learning performance with limited amount of labeled data by exploiting the large amount of unlabeled data? For this purpose, semi-supervised learning has become a hot topic of machine learning during the past ten years. One of the mainstream paradigms, the disagreement-based semi-supervised learning, trains multiple learners to exploit the unlabeled data, where the "disagreement" among the learners is crucial. This article briefly surveys sorae research advances of this paradigm.