汽车保险欺诈在全球范围内逐步蔓延,车险欺诈识别越来越受到社会关注。本文针对实际汽车保险索赔数据中样本数量大且不平衡的特点,提出了平衡随机森林和蚁群结合的组合分类器。首先,对高维、不平衡的车险索赔数据集进行特征选择与分类,将随机森林的特征重要性评价得分和数据的统计检验得分作为启发式信息,利用蚁群算法进行智能搜索,把随机森林的分类精度反馈给蚁群进行信息素的实时更新,挖掘出判别车险欺诈的特征组合。然后将基于蚁群优化算法的平衡随机森林模型应用到汽车保险欺诈识别中。研究结果表明:基于蚁群优化随机森林算法的汽车保险欺诈识别模型能够更好地对车险索赔数据进行分类预测,挖掘车险欺诈规律,具有更好的精确度和稳健性。
Automobile insurance fraud is gradually spreading in the global scope, and the identification of automobile insurance fraud is drawing more and more attention. Considering the large number and unbalance of claim samples of automobile insurance, an ensemble classifier combining the Balanced Random Forest and Ant Colony methods was proposed. First of all, the feature selection and classification were carried out based on the high dimensional and unbalanced insurance claim data set. The intelligent search was carried out by Ant Colony algorithm on the feature importance evaluation score for Random Forest and the statistical testing score of data as the heuristic information. The classification accuracy of Random Forest was fed back to the ant colony to update the pheromone. Then the feature combination for identifying the auto insurance fraud was mined out and the accuracy of the ensemble classifier was improved. Finally, the Balanced Random Forest model based on Ant Colony optimization algorithm was applied to the auto insurance fraud identification. The empirical results show that:the auto insurance fraud identification model based on the Random Forest and Ant Colony algorithm can be used for more effective classification and prediction of the auto insurance claims data and mining fraud rules. It has the benefits of better accuracy and robustness.