网络钓鱼是一种在线欺诈行为,它利用钓鱼网页仿冒正常合法的网页,窃取用户敏感信息从而达到非法目的.提出了基于集成学习的钓鱼网页深度检测方法,采用网页渲染来应对常见的页面伪装手段,提取渲染后网页的URL信息特征、链接信息特征以及页面文本特征,利用集成学习的方法,针对不同的特征信息构造并训练不同的基础分类器模型,最后利用分类集成策略综合多个基础分类器生成最终的结果.针对Phish Tank钓鱼网页的检测实验表明,本文提出的检测方法具有较好的准确率与召回率.
Phishing is a kind of online fraud that combines social engineering techniques and sophisticated attack vectors to steal the users' sensitive information to achieve the illegal purpose. In order to detect phishing web pages quickly and efficiently, this paper presents a model for depth detection of phishing web pages based on ensemble learning. The model uses page rendering to deal with common page camouflage, extract several sensitive features including URL and domain features, link and reference information, and contents of text messages; and then constructs and trains several base learning models with ensemble learning method using the features above; finally, generates the final result with base models using classification and integration method. Experiments on Phish Tank indicate that the detection model this paper proposed has good accuracy and recall rate.