传统的机器学习假设测试样本和训练样本来自同一概率分布。但当前很多学习场景下训练样本和测试样本可能来自不同的概率分布。域自适应学习能够有效地解决训练样本和测试样本概率分布不一致的学习问题,作为机器学习新出现的研究领域在近几年受到了广泛的关注。鉴于域自适应学习技术的重要性,综述了域自适应学习的研究进展。首先概述了域自适应学习的基本问题,并总结了近几年出现的重要的域自适应学习方法。接着介绍了近几年提出的较为经典的域自适应学习理论和当下域自适应学习的热门研究方向,包括样例加权域自适应学习、特征表示域自适应学习、参数和特征分解域自适应学习和多源域自适应学习。然后对域自适应学习进行了相关的理论分析,讨论了高效的度量判据,并给出了相应的误差界。接着对当前域自适应学习在算法、模型结构和实际应用这三个方面的研究新进展进行了综述。最后分别探讨了域自适应学习在特征变换和假设、训练优化、模型和数据表示、NLP研究中存在的问题这四个方面的有待进一步解决的问题。
Traditional supervised learning algorithms assume that the training data and the test data are drawn from the same probability distribution. But in many cases, this assumption is too simplified, and too harsh in light of modern applications of machine learning. Domain adaptation approaches are used to solve the problem that arises when the data distribution in the test domain is different from that in the training domain. Although the domain adaptation problem is a fundamental problem in machine learning, it only started gaining much attention very recently. In view of the theoretical and practical significance of domain adaptation methods, this paper summarizes the learning algorithm for domain adaptation. Firstly, the basic issues of domain adaptation and several important methods on domain adaptation are summarizes. Next, learning theory and hot research direction on domain adaptation are described, including instance weighting based method, feature representation based method, parameter and feature decomposition based method, domain adaptation with multiple sources. Thirdly, the theoretical analysis for domain adaptation and the effective distribution metric learning are illustrated. At the same time, the error bounds of those algorithms are also presented. Fourthly, new research and development in three aspects on domain adaptation in recent years are reviewed, including learning algorithm, model structure and practical application. Finally, the problems to be solved in aspects of feature transform and assumption, optimization algorithm, data representation and model, and the problem to be solved in NLP are discussed.