东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种异构直推式迁移学习算法

ISSN号：1000-9825
期刊名称：《软件学报》
时间：0
分类：TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]交通数据分析与挖掘北京市重点实验室(北京交通大学),北京100044, [2]河北大学数学与信息科学学院,河北保定071000, [3]河北省机器学习与计算智能重点实验室(河北大学),河北保定071000
相关基金：基金项目：国家自然科学基金（61375062,61370129）;高等学校博士学科点专项科研基金（20120009110006）;中央高校基本科研业务费专项基金（2014JBM029）;河北省科技厅科技计划（13210347）;河北省教育厅资助项目（QN20131006）;CCF-腾讯科研基金

关键词：异构迁移学习, 直推式迁移学习, 异构特征空间, 映射函数, heterogeneous transfer learning, transductive transfer learning, heterogeneous feature space, mapping function

中文摘要：

目标领域已有类别标注的数据较少时会影响学习性能,而与之相关的其他源领域中存在一些已标注数据.迁移学习针对这一情况,提出将与目标领域不同但相关的源领域上学习到的知识应用到目标领域.在实际应用中,例如文本-图像、跨语言迁移学习等,源领域和目标领域的特征空间是不相同的,这就是异构迁移学习.关注的重点是利用源领域中已标注的数据来提高目标领域中未标注数据的学习性能,这种情况是异构直推式迁移学习.因为源领域和目标领域的特征空间不同,异构迁移学习的一个关键问题是学习从源领域到目标领域的映射函数.提出采用无监督匹配源领域和目标领域的特征空间的方法来学习映射函数.学到的映射函数可以把源领域中的数据在目标领域中重新表示.这样,重表示之后的已标注源领域数据可以被迁移到目标领域中.因此,可以采用标准的机器学习方法（例如支持向量机方法）来训练分类器,以对目标领域中未标注的数据进行类别预测.给出一个概率解释以说明其对数据中的一些噪声是具有鲁棒性的.同时还推导了一个样本复杂度的边界,也就是寻找映射函数时需要的样本数.在4个实际的数据库上的实验结果,展示了该方法的有效性.

英文摘要：

The lack of labeled data affects the performance in target domain. Fortunately, there are ample labeled data in some other related source domains. Transfer learning allows knowledge to be transferred from source domains to target domain. In real applications, such as text-image and cross-language transfer learning, the feature spaces of source and target domains are different, that is heterogeneous transfer learning. This paper focuses on heterogeneous transductive transfer learning （HTTL）, an approach to improve the performance of unlabeled data in target domain by using some labeled data in heterogeneous source domains. Since the feature spaces of source domains and target domain are different, the key problem is to learn the mapping functions between the heterogeneous source domains and target domain. This paper proposes to learn the mapping functions by unsupervised matching in the different feature spaces.The data in source domains can be re-represented with the mapping functions and transferred to the target domain. Thus, in target domain, there are some labeled data which come from the source domains. Standard machine learning methods such as support vector machine can be used to train classifiers for predicting the labels of unlabeled data in target domain. Moreover, a probabilistic interpretation is derived to verify the robustness of the presented method over certain noises in the utility matrices. A sample complexity bound is given to indicate how many instances are needed to adequately find the mapping functions. The effectiveness of the proposed approach is verified by experiments on four real-world data sets.

同期刊论文项目

面向高维数据挖掘的非负矩阵分解关键问题研究

期刊论文 7

基于多标记学习的网络重叠社区发现模型及应用研究

期刊论文 18

同项目期刊论文

基于成分金字塔匹配的对象分类方法

基于对象颜色的图像特征加权表示方法

一种分割-合并聚类算法

基于超边相关性的图像分类方法

一种基于随机块模型的快速广义社区发现算法

基于函数调用路径准则的测试充分性研究

面向C#的函数调用路径生成

复杂系统中不可达函数调用路径检测方法研究

截断式鲁棒非负矩阵分解算法

基于函数调用路径关联分析的缺陷定位方法研究

基于函数调用路径的回归测试用例选择排序方法研究

中断优先级改变对函数静态调用关系影响研究

文档转换器变更影响分析与测试用例优化方法

语言与图灵测试

基于概率模型的大规模网络结构发现方法

基于Pivots选择的有效图像块描述子

一种新的社区/动态社区优化方法