基于差异的半监督学习属于半监督学习和集成学习的结合,是近年来机器学习领域的研究热点.但相关的理论研究较缺乏,且都未考虑存在分布噪声的情况.文中首先针对基于差异的半监督学习的特点,定义一种分类噪声和分布噪声的混合噪声(HCAD).其次给出算法在HCAD噪声下的可能近似正确(PAC)理论分析及其应用实例.最后基于投票边缘函数,推导出在HCAD噪声下多分类器系统的泛化误差上界,并给出相关证明.文中开展的理论研究可用于设计基于差异的半监督学习算法及评估算法的泛化能力,具有广阔的应用前景.
Diversity-based semi-supervised learning is the combination of semi-supervised learning and ensemble learning. It is a research focus in machine learning. However, its related theoretical analysis is insufficient, and the presence of distribution noise is not taken into account in these researches. In this paper, according to the characteristic of diversity-based semi-supervised learning, a hybrid classification and distribution (HCAD) noise is defined firstly. Then, probably approximately correct (PAC) analysis for diversity-based semi-supervised learning in the presence of HCAD noise and its application of the theorem are given. Finally, based on the voting margin, an upper bound is developed on the generalization error of multi-classifier systems with theoretic proofs in the presence of HCAD noise. The proposed theorems can be used to design diversity-based semi-supervised learning algorithms and evaluate their generalization ability, and they have a promising application prospect.