通过学习特征变换矩阵,可以将样本映射到新的空间,以适应给定的样本距离测度方法.基于此,提出一种面向k近邻的特征变换方法用于提高k近邻分类算法在非平衡数据集分类问题中的分类性能.该方法最大化基于g-mean的目标函数,学习线性特征变换矩阵,使得在新空间中同类距离尽可能小而异类距离尽可能大.基于g-mean的目标函数充分考虑了稀有类数据的特征,进而有效地保证在新空间中,k近邻对稀有类数据有更好的分类性能.UCI数据集上的实验结果表明,该方法能有效提高k近邻在稀有类问题中的泛化能力;较之于传统的PCA、LDA,该变换方法也显示出明显优势.
Feature transformation learning can map the original data space to a new one in which a given distance metric is suitable to calculate the distances between samples. This paper proposes a new feature transformation method to improve the performance of k nearest neighbor on imbalanced data sets. This method maximize the loss function based on g-mean metric to learn an optimal transformarion matrix such that in new space intra-class neighbors become more nearby, while extra-class neighbors become as far away from each other as possible. The loss function based on g-mean fully considers the characterisric of the rare class, which guarantees that KNN achieves better performance on rare class in the new metric space. The experiments on UCI data sets show that the proposed method effectively improves the KNN generalization ability for imbalanced problem. Besides, the proposed method presents obvious advantage compared with PCA and LDA.