半监督学习中当未标注样本与标注样本分布不同时,将导致分类器偏离目标数据的主题,降低分类器的正确性.文中采用迁移学习技术,提出一种TranCo—Training分类模型.每次迭代,根据每个未标注样本与其近邻标注样本的分类一致性计算其迁移能力,并根据迁移能力从辅助数据集向目标数据集迁移实例.理论分析表明,辅助样本的迁移能力与其训练错误损失成反比,该方法能将训练错误损失最小化,避免负迁移,从而解决半监督学习中的主题偏离问题.实验表明,TranCo-Training优于随机选择未标注样本的RdCo-Training算法,尤其是给定少量的标注目标样本和大量的辅助未标注样本时.
When unlabeled data draw from different distributions compared with labeled data in semi-supervise learning, the topic biases the target domain and the performance of semi-supervised classifier decreases. The transfer technique is applied to improve the performance of semi-supervised learning in this paper. An enhanced categorization model called TranCo-training is studied which combines transfer learning techniques with co-training methods. The transferability of each unlabeled instance is computed by an important component of TranCo-training according to the consistency with its labeled neighbors. At each iteration, unlabeled instances are transferred from auxiliary dataset according to their transfer ability. Theoretical analysis indicates that transfer ability of an unlabeled instance is inversely proportional to its training error, which minimizes the training error and avoids negative transfer. Thereby, the problem of topic bias in semi-supervised learnin~ is solved. The experimental results show that TranCo-training algorithm achieves better performance than the RdCo-training algorithm when a few labeled data on target domain and abundant unlabeled data on auxiliary domain are provided.