近年来,目标客户选择建模成为客户关系管理领域的研究热点。为了解决用于目标客户选择建模的训练样本类别分布高度不平衡的问题,本文首先提出了混合抽样方法。进一步地,将数据分组处理(GMDH)神经元网络引入到客户特征选择中,提出新的特征选择算法Log-GMDH。该算法分别从传递函数的选择和新的外准则的构建两个方面对传统GMDH网络模型进行了改进。最后,将提出的混合抽样、Log-GMDH和Logistic回归分类算法相结合,构建目标客户选择模型LogGMDH-Logistic。在CoIL2000预测竞赛中某汽车保险公司的目标客户选择数据集上进行实证分析,结果表明,LogGMDH-Logistic模型不仅在性能上优于已有的一些目标客户选择模型,而且具有很好的可解释性。
In recent years,database marketing has become a hot topic in customer relationship management(CRM),and customer targeting modeling is one of the most important issues in database marketing.Essentially,customer targeting modeling is a binary classification problem,that is,all customers are divided into two categories;the customers responding to the corporate marketing activities and the ones responding to no activities.This study combines group method of data handling(GMDH) neural networks,resampling technique,as well as Logistic regression classification algorithm to construct customer targeting model LogGMDH-Logistic.This model consists of three phases:(1) In order to solve the highly imbalanced class distribution of training set for customer targeting modeling,a new resampling method(hybrid sampling) is proposed to balance the class distribution of training set;(2) To select some key features from a large number of characteristics describing the customers,the GMDH neural network is introduced and a new feature selection algorithm Log-GMDH is presented,which improves the traditional GMDH neural network model in both the selection of transfer function and the construction of new external criterion.In terms of the selection of transfer function,it uses the non-linear Logistic regression function to replace the linear transfer function of the traditional GMDH neural network;and in the construction of external criterion,it selects the hit rate suitable for the customer targeting modeling to replace the regularization criterion of the traditional GMDH neural network;(3) It obtains the training set by mapping according to the selected feature subset,trains the Logistic regression classification algorithm and predicts the response probability of potential customers.The experiment is carried out in a customer targeting dataset of a car insurance company from CoIL2000 prediction competition,and the results show that LogGMDH-Logistic model is superior to some existing customer targeting models bo