近年来,恶意网站危害到用户的方方面面,对恶意网站URL的检测越来越重要。目前对恶意URL的检测主要有黑白名单技术和机器学习分类算法,黑白名单技术对于没有标记集的URL无能为力,每种机器学习分类算法也有各自不擅长的数据。文章结合黑白名单技术和机器学习算法提出了恶意URL多级过滤检测模型,训练每层过滤器的阈值,过滤器达到阈值的能够直接对URL进行判定,否则过滤到下一层过滤器。本模型能够充分发挥不同分类器针对所擅长数据类型的作用。文章用实例验证了多级过滤检测模型能够提高URL检测的准确率。
In recent years, as malicious websites harm to every aspect of the user, the detection of malicious web site URL is becoming increasingly important. At present, the detection of malicious URL mainly includes black and white list technology and machine learning classifi cation algorithm.However, the black and white list technology can do nothing while the URL is not in list. And each machine learning classification algorithm has some data which it is not good at. In this paper, we propose a malicious URL multi-level fi ltering detection model. By training the threshold of each layer fi lter, the fi lter can directly determine the URL when it reaches the threshold. Otherwise, the fi lter leave the URL to next layer. Therefore, every classifi er can deal with the data it is good at, this paper uses an example to verify that the model can improve the accuracy of URL detection.