高价值移动通信用户预测是电信客户关系管理中的一项重要内容。针对建立预测模型时遇到的高维、大规模、类不平衡等数据处理问题,提出了一种基于有效特征选择的预测方法。利用欠采样方式从初始不平衡数据集提取多个平衡训练集,使用结合Pearson相关性分析和随机森林特征重要性评估的特征选择策略,在集成学习方法中嵌入加权和投票机制获得最优的特征子集,最后采用随机森林算法建立预测模型。实验结果表明,该预测模型可以有效降低特征集的维度并提升对高价值移动通信用户的预测性能。
The prediction of high-value mobile communication user is an important part of telecom cus-tomer relationship management. This paper proposed a predicting method based on efficient feature selection to solve such problems as high dimension, large scale and imbalanced classes in data process-ing. With balanced training sets extracted from an initial imbalanced dataset using under-sampling,afeature selection strategy based on Pearson correlation analysis and random forest method assessing the feature's importance was applied and the best feature subset was selected by embedding weighted and voting mechanism in the ensemble learning method. The final prediction model was built by ran-dom forest algorithm. Experimental results show that the proposed model not only reduces the di-mension of feature set efficiently , but also improves its prediction performance for high -value mobile communication users.