东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于留一交叉验证的类不平衡危害预评估策略

ISSN号：1000-1220
期刊名称：小型微型计算机系统
时间：2012.10
页码：2287-2292
分类：TP391[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
作者机构：[1]江苏科技大学计算机科学与工程学院,江苏镇江212003, [2]爱荷华大学卡佛医学院,美国爱荷华城52242, [3]盐城工学院信息工程系,江苏盐城224051
相关基金：国家自然科学基金项目（61105057）资助; 江苏科技大学引进人才科研启动项目（35301002）资助
相关项目：基于矩阵低秩近似的大规模文本聚类集成方法研究

关键词：类不平衡, 留一交叉验证, 危害测度, 预评估, class imbalance , leave-one-out cross validation, harmfulness measure, pre-evaluation

中文摘要：

近年来,类不平衡问题已逐渐成为人工智能﹑机器学习和数据挖掘等领域的研究热点,目前已有大量实用有效的方法.然而,近期的研究结果却表明,并非所有的不平衡数据分类任务都是有害的,在无害的任务上采用类不平衡学习算法将很难提高,甚至会降低分类的性能,同时可能大幅度增加训练的时间开销.针对此问题,提出了一种危害预评估策略.该策略采用留一交叉验证法（LOOCV,Leave-one-out cross validation）测试训练集的分类性能,并据此计算一种称为危害测度（HM,Harmful-ness Measure）的新指标,用以量化危害的大小,从而为学习算法的选择提供指导.通过8个类不平衡数据集对所提策略进行了验证,表明该策略是有效和可行的.

英文摘要：

In recent years, class imbalance problem has gradually evolved into one of the hotspots in several research fields, including artificial intelligence, machine learning and data mining. At present, many practical and effective methods have been proposed to solve this problem. However, the recent research indicated that not all of the imbalanced classification tasks are harmful and conducting specifically designed class imbalance learning algorithms on those unharmful classification tasks would hardly improve and even degenerate classification performance, meanwhile it is possible to increase training time to a large extent. To solve this problem, we propose a pre-evaluation strategy to estimate the harmfulness of skewed classification tasks. The strategy acquires the classification performance of training set by leave-one-out cross validation, and then uses the obtained performance to calculate a novel index named as Harmfulness Measure （HM） in order to assess the degree of damage. The index would provide helpful information to guide us to select appropriate learning algorithm. The experimental results on eight skewed datasets verified the effectiveness and feasibility of the presented strategy.

同期刊论文项目

基于矩阵低秩近似的大规模文本聚类集成方法研究

期刊论文 27 会议论文 5

同项目期刊论文

一种基于矩阵低秩近似的聚类集成算法

Using multi-label k nearest neighbour classifier for predicting virus protein subcellular multi-loca

数据结构课程教学方法初探

A Novel Video Image Text Detection Method

结合K均值与Laplacian的聚类集成算法

基于相似度矩阵的谱聚类集成图像分割

文本谱聚类算法研究

Estimating Harmfulness of Class Imbalance by Scatter Matrix based Class Separability Measure

Agent–based Approach for Crowded Pedestrian Evacuation Simulation

基于Trace变换的步态识别算法

基于谱聚类的聚类集成算法

A Cable Sheath Material Thickness Measurement Method based on Image Measurement Technology

Agent-based approach for crowded pedestrian evacuation simulation

Spectral cluster ensemble image segmentation based on similarity matrix

Mining and Integrating Reliable Decision Rules on Imbalanced Cancer Gene Expression Data Sets

一种结合粒子群优化理论改进的郭涛算法及其应用

行人步态的特征表达及识别综述

基于谱聚类的动态网络社区演化分析算法

基于事件驱动的QoS移动传感网路由协议

基于事件驱动的QoS异构传感器网络路由协议

基于B/S的网络房产信息超市设计

期刊信息

《小型微型计算机系统》
中国科技核心期刊

主管单位:中国科学院
主办单位:中国科学院沈阳计算技术研究所
主编：林浒
地址：沈阳市浑南新区南屏东路16号
邮编：110168
邮箱：xwjxt@sict.ac.cn
电话：024-24696120 024-24696190-8870

国际标准刊号：ISSN：1000-1220
国内统一刊号：ISSN：21-1106/TP
邮发代号:8-108

获奖情况:
中国自然科学核心期刊,中国科学引文数据库来源期刊

国内外数据库收录:
俄罗斯文摘杂志,波兰哥白尼索引,荷兰文摘与引文数据库,美国剑桥科学文摘,英国科学文摘数据库,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:23212