东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

结构化学习的噪声可学习性分析及其应用

ISSN号：1000-9825
期刊名称：《软件学报》
时间：0
分类：TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]哈尔滨工业大学计算机科学与技术学院语言语音教育部一微软重点实验室,黑龙江哈尔滨15000u
相关基金：国家自然科学基金（61173073）;国家高技术研究发展计划（863）（2011AA01A207）

关键词：结构化学习, 噪声PAC可学习性, 词性标注, 自然语言处理, 协同训练, 跨语言映射, 半监督学习, structured learning, PAC learning with noise, pos-tagging, natural language processing, co-training, cross-lingual projection, semi-supervised learning

中文摘要：

噪声可学习性理论指出，有监督学习方法的性能会受到训练样本标记噪声的严重影响．然而，已有相关理论研究仅针对二类分类问题．致力于探究结构化学习问题受噪声影响的规律性．首先，注意到在结构化学习问题中，标注数据的噪声会在训练过程中被放大，使得训练过程中标记样本的噪声率高于标记样本的错误率．传统的噪声可学习性理论并未考虑结构化学习中的这一现象，从而低估了问题的复杂性．从结构化学习问题的噪声放大现象出发，提出了新的结构化学习问题的噪声可学习性理论．在此基础上，提出了有效训练数据规模的概念，这一指标可用于在实践中描述噪声学习问题的数据质量，并进一步分析了实际应用中的结构化学习模型在高噪声环境下向低阶模型回退的情况．实验结果证明了该理论的正确性及其在跨语言映射和协同训练方法中的应用价值和指导意义．

英文摘要：

Performance of supervised machine learning can be badly affected by noises of labeled data, as indicated by existing well studied theories on learning with noisy data. However these theories only focus on two-class classification problems. This paper studies the relation between noise examples and their effects on structured learning. Firstly, the paper founds that noise of labeled data increases in structured learning problems, leading to a higher noise rate in training procedure than on labeled data. Existing theories do not consider noise increament in structured learning, thus underestimate the complexities of learning problems. This paper provides a new theory on learning from noise data with structured predictions. Based on the theory, the concept of ＂effective size of training data＂ is proposed to describe the qualities of noisy training data sets in practice, The paper also analyzes the situations when structured learning models will go back to lower order ones in applications. Experimental results are given to confirm the correctness of these theories as well as their practical values on cross-lingual projection and co-training.

同期刊论文项目

基于半监督结构化学习的跨语言映射研究

期刊论文 7

同项目期刊论文

基于文本蕴含的选择类问题解答技术研究

基于层次分析模型的产品多属性综合排序

基于条件随机域模型的比较要素抽取研究

层次短语翻译中基于Markov随机场的层次切分模型

对数线性翻译模型的判别式训练综述

以机器翻译技术为核心的多语信息处理研究

期刊信息

《软件学报》
北大核心期刊（2011版）

主管单位:中国科学院
主办单位:中国科学院软件研究所中国计算机学会
主编：赵琛
地址：北京8718信箱中国科学院软件研究所
邮编：100190
邮箱：jos@iscas.ac.cn
电话：010-62562563

国际标准刊号：ISSN：1000-9825
国内统一刊号：ISSN：11-2560/TP
邮发代号:82-367

获奖情况:
2001年入选中国期刊方阵“双百期刊”,2000年荣获中国科学院优秀科技期刊一等奖

国内外数据库收录:
俄罗斯文摘杂志,美国数学评论（网络版）,波兰哥白尼索引,德国数学文摘,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘,英国科学文摘数据库,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:54609