分类问题是机器学习领域的重要研究内容之一,现有的一些分类方法都已经相对成熟,用它们来对平衡数据进行分类一般都能取得较好的分类性能,但在现实世界中数据往往都是不平衡的,而现有的分类器的设计都是基于类分布大致平衡这一假设的,如果用这些方法来对不平衡数据进行分类就会导致分类器的性能下降,因而研究用于处理不平衡数据集的分类方法显得相当重要.为便于读者更清晰地了解数据不平衡分类问题的研究现状和未来研究的动向,本文对相关的研究进行了综述和展望.
Classification is one of the most important research contents in machine learning, and the traditional classifi- cation methods are relatively mature, when dealing with well-balanced data they can make good performance. But in real world the data is usually imbalanced. The design of the existing classification methods is often based on the assumption that the training sets are well-balanced, so it may lead to the descending capability of the classification methods when dealing with imbalanced data. Making researches on imbalanced data is quite important. In order to help readers to have a clear idea of the currently proposed and future work on the issue of unbalanced data classification, we make a simple survey of the studies of this issue and give some key problems attracting researchers in this paper.