针对不平衡分类问题的极端情况,即用于训练的样本极少甚至只有一个实例,该文提出了一种单实例分类算法,这种方法使用球面作为分类面,在目标类的单实例在球内和反类尽量位于球面外的约束条件下,最大化该分类球面的半径,该方法能够有效地处理线性可分的数据分布。当输入样本分布结构呈高度非线性时,该算法通过核映射将低维输入空间中的非线性可分问题变换为高维特征空间中可能的线性可分问题,并以内积形式刻画,最终在特征空间上通过核技巧获得原问题的解决。通过对标准数据集和实际数据集的实验,验证了单实例分类算法在处理数据不平衡问题上的有效性。
In order to solve the extreme situation that only a few target examples or only one can be used in training the classification, a single sample classification algorithm is presented here. Spherical surfaces are applied as classified hypersphere, and the largest radius can be obtained enclosing the single sample under the restriction that all outliers are outside the hypersphere. It fails when the distribution of input patterns is complex. The classifier applies kernel means, performing a nonlinear data transformation into some high dimensional feature space, increases the probability of the linear separability of the patterns within the feature space and therefore solves the original classification problem. The paper verifies that the algorithm can effectively deal with the unbalanced data classification on various synthetic and UCI datasets.