东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于概率校准的集成学习

ISSN号：1001-9081
期刊名称：《计算机应用》
时间：0
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术] TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]北京大学信息科学技术学院,北京100871, [2]北京大学软件与微电子学院,北京102600
相关基金：国家自然科学基金资助项目（61232005）; CCF-腾讯科研基金资助项目

关键词：集成学习, 概率校准, 多重共线性, 有放回抽样, 随机子空间, ensemble learning, probability calibration, multiple collinearity, bootstrap sampling, random subspace

中文摘要：

针对原有集成学习多样性不足而导致的集成效果不够显著的问题,提出一种基于概率校准的集成学习方法以及两种降低多重共线性影响的方法。首先,通过使用不同的概率校准方法对原始分类器给出的概率进行校准;然后使用前一步生成的若干校准后的概率进行学习,从而预测最终结果。第一步中使用的不同概率校准方法为第二步的集成学习提供了更强的多样性。接下来,针对校准概率与原始概率之间的多重共线性问题,提出了选择最优（choose-best）和有放回抽样（bootstrap）的方法。选择最优方法对每个基分类器,从原始分类器和若干校准分类器之间选择最优的进行集成;有放回抽样方法则从整个基分类器集合中进行有放回的抽样,然后对抽样出来的分类器进行集成。实验表明,简单的概率校准集成学习对学习效果的提高有限,而使用了选择最优和有放回抽样方法后,学习效果得到了较大的提高。此结果说明,概率校准为集成学习提供了更强的多样性,其伴随的多重共线性问题可以通过抽样等方法有效地解决。

英文摘要：

Since the lackness of diversity may lead to bad performance in ensemble learning, a new two-phase ensemble learning method based on probability calibration was proposed, as well as two methods to reduce the impact of multiple collinearity. In the first phase, the probabilities given by the original classifiers were calibrated using different calibration methods. In the second phase, another classifier was trained using the calibrated probabilities and the final result was predicted. The different calibration methods used in the first phase provided diversity for the second phase, which has been shown to be an important factor to enhance ensemble learning. In order to address the limited improvement due to the correlation between base classifiers, two methods to reduce the multiple collinearity were also proposed, that is, choose-best and bootstrap sampling method. The choose-best method just selected the best base classifier among original and calibrated classifiers; the bootstrap method combined a set of classifiers, which were chosen from the base classifiers with replacement.The experimental results showed that the use of different calibrated probabilities indeed improved the effectiveness of the ensemble; after using the choose-best and bootstrap sampling methods, further improvement was also achieved. It means that probability calibration provides a new way to produce diversity, and the multiple collinearity caused by it can be solved by sampling method.

同期刊论文项目

云存储的隐私保护和安全保障机制

期刊论文 15

同项目期刊论文

一种超椭圆曲线密码处理器并行结构设计

面向云存储的多副本文件完整性验证方案

基于动态域划分的MapReduce安全冗余调度策略

一种基于半监督GHSOM的入侵检测方法

大数据安全与隐私保护

基于节点分割的社交网络属性隐私保护

数据挖掘隐私保护算法研究综述

基于非线性滤波和纹理传输的图像风格化

基于增量式GHSOM神经网络模型的入侵检测研究

数据库形式化安全策略模型建模及分析方法

防止数据泄露的云存储数据分布优化模型

面向社交网络的隐私保护方案

期刊信息

《计算机应用》
北大核心期刊（2011版）

主管单位:四川省科学技术协会
主办单位:四川省计算机学会中国科学院成都分院
主编：张景中
地址：成都市人民南路四段九号科分院计算所
邮编：610041
邮箱：xzh@joca.cn
电话：028-85224283

国际标准刊号：ISSN：1001-9081
国内统一刊号：ISSN：51-1307/TP
邮发代号:62-110

获奖情况:
全国优秀科技期刊一等奖,国家期刊奖提名奖,中国期刊方阵双奖期刊,中文核心期刊,中国科技核心期刊

国内外数据库收录:
俄罗斯文摘杂志,波兰哥白尼索引,美国剑桥科学文摘,英国科学文摘数据库,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:53679