流数据环境下如何利用大量非标记样本进行高效学习是一个非常重要的问题,基于分歧策略的主动学习是一种有效的解决方法,但通常该类算法只考虑具有最大分歧的边界样本,没有考虑训练前期对分歧度小的样本误判后的样本矫正问题,为此,提出一种基于分歧度评价的融合主动学习和集成学习的高效能学习方法。该方法基于样本分歧度和不同的训练阶段,采取不同的非标记样本选取方式。为评价方法性能,在人工流数据和HEp-2细胞图像数据上进行了实验,结果表明该方法相对于目前的Qboost方法,需要的训练样本数少且具有更高的分类精度。
It is very important to use a large amount of unlabeled samples for efficient learning in date stream environment.The Active Learning based on the disagreement strategy is an effective solution, but usually, the algorithm only considersthe largest boundary sample, and neglects the possibility of misjudging of the minimum divergence samples in theearlier stage of training. To achieve the label revision of misjudged samples, a highly efficient learning method integratedwith active learning and ensemble learning that based on divergence is proposed. Based on the sample divergence andtraining stages, different selection strategies for unlabeled sample are adopted by this method. To evaluate the effectivenessof the proposed method, experiments are made on the artificial stream date and HEP-2 cell image. Experimental resultsshow that this method needs less training samples and provides a higher classification precision over the existing Qboost.