电信客户流失预测是电信运营商客户关系管理系统的一个重要问题,其目的是预测具有较高流失风险的客户.电信客户流失预测模型的构建过程包括数据预处理、不均衡处理、特征选择和分类器的训练与评估.针对电信数据集中存在的特征维度过高问题,结合过滤式特征选择和嵌入式特征选择方法的优点,提出了一种基于Fisher比率和预测风险准则的分步特征提取方法.结合真实数据集的实验结果表明,该方法能够减少特征维度,提高分类器的预测效果.
Telecom customer churn prediction is crucial to the customer relationship management systems of telecom operators. It aims to predict a particular customer who is at a high risk of churning. The predicting process includes the steps of data pre-processing, imbalance processing, feature selection, classifier training and evaluation. A two-stage feature selection method based on fisher's ratio and prediction risk was proposed, which took advantage of the filter feature selection method and wrapper feature selection method to solve the high dimensionality problem of telecom customer churn prediction. The method was evaluated on a real-world dataset, and the experimental results verify that it is able to reduce feature dimensionality and improve the performance of classifiers.