传统的跨领域分类学习一般考虑均衡的单一源域到单一目标域的学习,但在现实世界中数据往往是不平衡的.当用于解决不平衡分类问题时,由于分类器的偏向性,其分类精度、抗噪性能往往有不同程度的下降.为了克服域间不平衡性,提出了一种不平衡多源跨领域分类算法(imbalance multisource classfication on cross-domain learning,IMCCL),该算法依据被众多实验证明有效的“逻辑回归模型”与“后验概率最大法则”构建多个训练域分类器并综合指导目标域的数据分类.为了充分高效利用大样本的源域数据,满足大样本的快速运算,在结合CDdual算法的基础上,提出了IMCCL的快速算法(IMCCL-CDdual).将其应用到文本数据分类与图像识别分类的实验结果表明:该算法具有较高的识别率、快速的识别速度和抗干扰性和领域自适应性.
Most of current cross-domain classifiers are proposed for single source and single target domains and basically based on the assumption that there is a balance between these two domains. However, this assumption is often violated in the real world. When these classifiers are applied to imbalanced domains, their classification performance and robustness to noise will heavily degrade. For example, Baysian classifier depends heavily on the estimation of the sample distributions of source and target domains. When large source domain but only a small target domain are available, the classification accuracy of this classifier will degrade a lot. In order to address this imbalanced issue and use abundant data in the source domain to do an effective transfer learning between small target domain and multisource domains, a novel fast cross-domain classification method called IMCCL for "small-target+ multisource" datasets is proposed here. The proposed method IMCCL is rooted at logistic regression model and MAP. Accordingly, the proposed IMCCL is integrated together with the latest advance--CDdual algorithm--to develop its fast version IMCCL-CDual for "small-target+large- multisouree" domains. This fast classification method is also theoretically analyzed. Our experimental results on artificial and real datasets indicate the effectiveness of the proposed method IMCCL-CDual in classification accuracy, the classification speed, robustness and domain adaption.