位置:成果数据库 > 期刊 > 期刊详情页
一种基于级联模型的类别不平衡数据分类方法
  • ISSN号:0469-5097
  • 期刊名称:《南京大学学报:自然科学版》
  • 时间:0
  • 分类:TP18[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
  • 作者机构:[1]南京大学软件新技术国家重点实验室,南京210093, [2]佐治亚理工学院计算机学院,美国佐治亚州亚特兰大30332—0280
  • 相关基金:国家杰出青年科学基金(60325207),江苏省自然科学基金重点项目(BK2004001),“973”国家计划(2002CB312002)
中文摘要:

真实世界问题中,不同类别的样本在数目上往往差别很大,而传统机器学习方法难以对小类样本进行正确分类,若小类的样本是足够重要的,就会带来较大的损失.因此,对类别分布不平衡数据的学习已成为机器学习目前面临的一个挑战.受计算机视觉中级联模型的启发,提出一种针对不平衡数据的分类方法BalanceCascade.该方法逐步缩小大类别使数据集趋于平衡,在此过程中训练得到的一系列分类器通过集成方式对预测样本进行分类.实验结果表明,该方法可以有效地提高在不平衡数据上的分类性能,尤其是在分类性能受数据的不平衡性严重影响的情况下.

英文摘要:

In machine learning and data mining, there are many aspects that might influence the performance of a learning system in real world applications. Class imbalance is one of these factors, in which training examples in one class heavily outnumber the examples in another class. Classifiers generally have difficulty in learning concept from the minority class. In many applications if the minority class is more important than the majority class, there will be great loss. There is severe class imbalance in the face detection problem, which greatly decreases the detection speed. The cascade structure is proposed to accelerate the learning process. Cascade is a classifier system with a sequence of n node classifiers. At the beginning, all training examples are available to train the first node classifier. Then all positive examples and only a subset of negative examples are passed to the next node, neglecting those negative examples correctly classified by the first node. This procedure repeats until all node classifiers are trained. A test example is passed to the next node if it is recognized as positive by the current node, or is rejected immediately as negative. However, the learning goal of a cascade node classifier is quite different to usual classifiers in the sense that every node aims to get a high detection rate and only a moderate false alarm rate. The cascade can achieve both high overall detection rate and low overall false alarm rate. Every time training examples are passed to the next node, there are some negatives that are neglected. That is, there are fewer negatives in the training set than those in the previous node. Considering the class imbalance problem, it means a more balanced training set, compared with training sets in previous nodes. In early nodes within a cascade it is quite easy to achieve the learning goal, i.e. train a classifier with high detection rate and only moderate false alarm rate. However, it becomes harder in deeper nodes, since the negative examples in these nodes are fals

同期刊论文项目
期刊论文 49 会议论文 36 获奖 7 著作 4
同项目期刊论文
期刊信息
  • 《南京大学学报:自然科学版》
  • 中国科技核心期刊
  • 主管单位:中华人民共和国教育部
  • 主办单位:南京大学
  • 主编:龚昌德
  • 地址:南京汉口路22号南京大学(自然科学版)编辑部
  • 邮编:210093
  • 邮箱:xbnse@netra.nju.edu.cn
  • 电话:025-83592704
  • 国际标准刊号:ISSN:0469-5097
  • 国内统一刊号:ISSN:32-1169/N
  • 邮发代号:28-25
  • 获奖情况:
  • 中国自然科学核心期刊,中国期刊方阵“双效”期刊
  • 国内外数据库收录:
  • 美国化学文摘(网络版),美国数学评论(网络版),德国数学文摘,中国中国科技核心期刊,中国北大核心期刊(2004版),中国北大核心期刊(2008版),中国北大核心期刊(2011版),中国北大核心期刊(2014版),中国北大核心期刊(2000版)
  • 被引量:9316