作为机器学习和模式识别中最重要的分类模型之一,逻辑回归(LR)具有较好的可解释性、泛化性.文中将该模型应用到类不平衡问题中,提出面向类不平衡的逻辑回归方法(LRCI)以处理数据不平衡问题.为了充分考虑数据不平衡性,分别构造基于g-mean的目标函数(GBM)和基于,f-measure的目标函数(FBM),监督LRCI学习模型参数,进而有效保证学习到的模型同时具有高准确率和召回率.UCI数据集上的实验表明,LRCI在保持LR高准确率的前提下,有效提高它的召回率、g—mean和f-measure.与其他类不平衡分类模型相比,LRCI表现出较明显优势.
As one of the most important classification models in pattern recognition and machine learning, logistic regression(LR) is an interpretable model and has good generalization ability. In this paper, LR model is applied to class imbalance problem, and a method, named LR for class imbalance (LRCI) , is proposed to tackle data imbalance problem. To take a full consideration of data imbalance, two objective functions g-mean based metric (FBM) and f-measure based metric(GBM) are constructed respectively to supervise LRCI learning model parameters. And then, the model is effectively quaranteed high accuracy and recall rate. The experimental results on UCI datasets show that LRCI significantly boosts the performance on recall, g-mean and f-measure in the premise of high accuracy of LRCI. Besides, LRCI presents significant advantage comparing to other state-of-the-art class imbalance learning model.