东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

大样本多源域与小目标域的跨领域快速分类学习

ISSN号：1000-1239
期刊名称：《计算机研究与发展》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]江南大学数字媒体学院,江苏无锡214122, [2]江苏北方湖光光电有限责任公司,江苏无锡214035
相关基金：国家自然科学基金项目（61170122,61272210）;江苏省自然科学基金项目（BK2011003）;江苏省333高层次人才培养工程基金项目（BRA2011192）

关键词：跨领域, 多源, 逻辑回归, 后验概率, 分类, 不平衡, unbalance cross-domain, multi-source, logistic regression, posteriori probability, classification

中文摘要：

传统的跨领域分类学习一般考虑均衡的单一源域到单一目标域的学习,但在现实世界中数据往往是不平衡的.当用于解决不平衡分类问题时,由于分类器的偏向性,其分类精度、抗噪性能往往有不同程度的下降.为了克服域间不平衡性,提出了一种不平衡多源跨领域分类算法（imbalance multisource classfication on cross-domain learning,IMCCL）,该算法依据被众多实验证明有效的“逻辑回归模型”与“后验概率最大法则”构建多个训练域分类器并综合指导目标域的数据分类.为了充分高效利用大样本的源域数据,满足大样本的快速运算,在结合CDdual算法的基础上,提出了IMCCL的快速算法（IMCCL-CDdual）.将其应用到文本数据分类与图像识别分类的实验结果表明：该算法具有较高的识别率、快速的识别速度和抗干扰性和领域自适应性.

英文摘要：

Most of current cross-domain classifiers are proposed for single source and single target domains and basically based on the assumption that there is a balance between these two domains. However, this assumption is often violated in the real world. When these classifiers are applied to imbalanced domains, their classification performance and robustness to noise will heavily degrade. For example, Baysian classifier depends heavily on the estimation of the sample distributions of source and target domains. When large source domain but only a small target domain are available, the classification accuracy of this classifier will degrade a lot. In order to address this imbalanced issue and use abundant data in the source domain to do an effective transfer learning between small target domain and multisource domains, a novel fast cross-domain classification method called IMCCL for ＂small-target＋ multisource＂ datasets is proposed here. The proposed method IMCCL is rooted at logistic regression model and MAP. Accordingly, the proposed IMCCL is integrated together with the latest advance--CDdual algorithm--to develop its fast version IMCCL-CDual for ＂small-target＋large- multisouree＂ domains. This fast classification method is also theoretically analyzed. Our experimental results on artificial and real datasets indicate the effectiveness of the proposed method IMCCL-CDual in classification accuracy, the classification speed, robustness and domain adaption.

同期刊论文项目