东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于多类重采样的非平衡数据极速学习机集成学习

ISSN号：0469-5097
期刊名称：《南京大学学报：自然科学版》
时间：0
分类：TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]河北大学管理学院,保定071002, [2]沧州师范学院计算机系,沧州061001, [3]河北大学数学与信息科学学院,保定071002, [4]沧州职业技术学院信息工程系,沧州061001
相关基金：国家自然科学基金（61170040;71371063）

关键词：极速学习机, 非均衡数据, 重采样, 集成学习, extreme learning machine（ELM）, imbalanced data, resampling, ensemble learning

中文摘要：

极速学习机（Extreme learning machine,ELM）虽然已在理论和应用中证实有很好的泛化性能和极快的训练速度,但是在处理非均衡数据时,它更偏向多数类且极容易忽略少数类,基于数据重采样的集成学习可以帮助ELM解决少数类分类精度低的问题.提出一种按类别重采样技术并据此发展了一种ELM集成学习方法.该方法可充分利用少数类样本的信息,实验结果显示该方法性能明显优于单一的ELM学习模型.由于重采样是大数据处理的最核心的技术之一,该方法对非均衡大数据的学习模型建立有着一般性的指导意义.

英文摘要：

ELM（Extreme learning machine）has been confirmed that has good generalization performance and fast training speed in theory and application.Because of favoring the majority class,ignoring the minority class and leading to a low classification accuracy of the minority class,ELM can not effectively handle the imbalanced data.Imbalanced data is common in life,such as identifying fraudulent credit card transactions,predicting preterm births,learning word pronunciations,and so on.The main strategies to handle imbalanced data classification include resampling technology,integrated learning,and cost sensitive learning.The basic sampling methods are under-sampling and oversampling.The main principle of new methods is to combine multiple random under-samplings,and further,to develop an ELM-based ensemble learning algorithm.It effectively relieves the problem of low classification accuracy of minority class.In order to evaluate the classification performance on imbalanced data more reasonably,we use F-measureand G-mean values as the evaluation criteria in our experiment.The value is higher,the classification performance of the minority class is better.In this paper,experimental results demonstrate that this method has higher F-measure and G-mean values comparing with the single ELM learning model.It implies that the ELM ensemble learning based on multiple under-samplings can improve the classification performance of the minority class.In addition,every classifier is independent of each other before voting.So the resampling method can be parallel implemented.First,large data set is decomposed into many small data sets,and then each small data set is learned by ELM.So it improves the computing speed.Because the resampling technique is one of the core technologies about processing large data,the method has general guidance significance for establishing the learning model to handle large imbalanced data.

同期刊论文项目