为解决数据挖掘中存在的数据漂移和客户价值分布不平衡问题,采用了分阶段聚类和代价敏感支持向量机的新方法.新方法首先对全部客户聚类得到特征相似的客户群,然后用某个区域客户属于某客户群的后验概率对城市进行聚类,具有相似后验概率分布的城市群被认为是具有类似的客户结构,每个城市群的客户组成了新的客户样本,对每个样本分别进行代价敏感分类,并完成客户细分.对比实验表明,该方法提高整体预测准确率和高价值客户识别能力,降低模型错误分类代价.改进的方法能在保证分类准确率的同时,更有助于企业锁定高端客户,动态地调整区域市场战略.
To solve the problem of data drift and asymmetric misclassification costs in customer segmentation, a cost sensitive learning method integrated with two-step cluster is proposed. This method firstly applied kmeans cluster by the posterior probability distribution of give region to group similar regions together, and then used cost-sensitive support vector machine to find customer segmentation for each region-group. The results show that the cluster based on similarity of customer segmentation structure can improve the total accuracy and the proposed cost-sensitive support vector machine is an effective method to distinguish high value customers compared to the original support vector machine.