东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

具有回忆和遗忘机制的数据流挖掘模型与算法

ISSN号：1000-9825
期刊名称：《软件学报》
时间：0
分类：TP181[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]湖南商学院计算机与信息工程学院,湖南长沙410205, [2]高性能计算国家重点实验室(国防科学技术大学),湖南长沙410073
相关基金：国家自然科学基金（61272141,60905032,61120106005,61273232）

关键词：数据流挖掘, 概念漂移, 回忆与遗忘, Ebbinghaus遗忘曲线, 选择性集成, data stream mining, concept drift, recalling and forgetting, Ebbinghaus forgetting curve, ensemble pruning

中文摘要：

集成式数据流挖掘是对存在概念漂移的数据流进行学习的重要方法.针对传统集成式数据流挖掘存在的缺陷,将人类的回忆和遗忘机制引入到数据流挖掘中,提出基于记忆的数据流挖掘模型MDSM（memorizing based data stream mining）.该模型将基分类器看作是系统获得的知识,通过“回忆与遗忘”机制,不仅使历史上有用的基分类器因记忆强度高而保存在“记忆库”中,提高预测的稳定性,而且从“记忆库”中选取当前分类效果好的基分类器参与集成预测,以提高对概念变化的适应能力.基于MDSM模型,提出了一种集成式数据流挖掘算法MAE（memorizing based adaptive ensemble）,该算法利用Ebbinghaus遗忘曲线对系统的遗忘机制进行设计,并利用选择性集成来模拟人类的“回忆”机制.与4种典型的数据流挖掘算法进行比较,结果表明：MAE算法分类精度高,对概念漂移的整体适应能力强,尤其对重复出现的概念漂移以及实际应用中存在的复杂概念漂移具有很好的适应能力.不仅能够快速适应新的概念变化,并且能够有效抵御随机的概念波动对系统性能的影响.

英文摘要：

Using ensemble of classifiers on sequential chunks of training instances is a popular strategy for data stream mining with concept drifts. Aiming at the limitations of existing approaches, this paper introduces human recalling and forgetting mechanisms into a data stream mining system, and proposes a memorizing based data stream mining （MDSM） model. The model considers base classifiers as learned knowledge. Through ＂recalling and forgetting＂ mechanism, most useful classifiers in the past will be reserved in a ＂memory repository＂, which improves the stability under random concept drifts. The best classifiers for the current data chunk are selected for prediction, which achieves high adaptability for different concept drifts. Based on MSDM, the paper puts forward a new algorithm MAE （memorizing based adaptive ensemble）. MAE uses Ebbinghans forgetting curve as forgetting mechanism and adopts ensemble pruning to emulate the ＂recalling＂ mechanism. Compared with four traditional data stream mining approaches, the results show that MAE achieves high and stable accuracy with moderate training time. The results also proved that MAE has good adaptability for different kinds of concept drifts, especially for the applications with recurring or complex concept drifts.

同期刊论文项目

基于在线机器学习的超级计算机主动容错技术研究

期刊论文 1

面向百万万亿级高效能计算的容错关键技术研究

期刊论文 12

云计算环境下的可信服务组合及运行保障研究

期刊论文 5

网络攻击行为的高效在线机器学习技术研究

期刊论文 5 会议论文 4

同项目期刊论文

Hybrid hierarchy storage system in MilkyWay-2 supercomputer

MilkyWay-2 supercomputer: system and application

一种面向RAID阵列的SSD设计优化方法

PartialRC: A Partial Recomputing Method for Efficient Fault Recovery on GPGPUs

Reorder Write Sequence by Hetero-Buffer to Extend SSD's Lifespan

CSWL: Cross-SSD Wear-Leveling Method in SSD-Based RAID Systems for System Endurance and Performance

容错的并行多重网格算法

面向计算流体力学应用开发框架的容错周期优化方法

面向异构体系结构的GA模型拓展

The TH Express high performance interconnect networks

基于天河2高速互连网络实现混合层次文件系统H2FS高速通信

解决动态多中心问题的自学习差异进化算法

基于用户相似度的协同过滤推荐算法

复杂动力网络的鲁棒性同步

两个异构复杂网络的广义同步与参数识别

选择性集成算法分类与比较

A fast ensemble pruning algorithm based on pattern mining process

基于FP-Tree的快速选择性集成算法

选择性集成开发平台的设计与实现

期刊信息

《软件学报》
北大核心期刊（2011版）

主管单位:中国科学院
主办单位:中国科学院软件研究所中国计算机学会
主编：赵琛
地址：北京8718信箱中国科学院软件研究所
邮编：100190
邮箱：jos@iscas.ac.cn
电话：010-62562563

国际标准刊号：ISSN：1000-9825
国内统一刊号：ISSN：11-2560/TP
邮发代号:82-367

获奖情况:
2001年入选中国期刊方阵“双百期刊”,2000年荣获中国科学院优秀科技期刊一等奖

国内外数据库收录:
俄罗斯文摘杂志,美国数学评论（网络版）,波兰哥白尼索引,德国数学文摘,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘,英国科学文摘数据库,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:54609