针对钓鱼URL常用的混淆技术,提出一种基于规则匹配和逻辑回归的钓鱼网页检测方法(RMLR)。首先,使用针对违反URL命名标准及隐藏钓鱼目标词等混淆技术所构建的规则库对给定网页分类,若可判定其为钓鱼网址,则省略后续的特征提取及检测过程,以满足实时检测的需要。若未能直接判定为钓鱼网址,则提取该URL的相关特征,并使用逻辑回归分类器进行二次检测,以提升检测的适应性和准确率,并降低因规则库规模不足导致的误报率。同时,RMLR引入基于字符串相似度的Jaccard随机域名识别方法来辅助检测钓鱼URL。实验结果表明,RMLR准确率达到98.7%,具有良好的检测效果。
Aiming at the obfuscation techniques commonly used in phishing URL,a phishing detection method(RMLR)based on rule matching and logical regression is proposed.First,it classifies a given web by using a rule base constructedbased on some obfuscation techniques such as the violation of URL naming standards and hidden phishing target.If it canbe judged as a phishing site,the subsequent feature extraction and detection process is omitted to meet the need of realtimedetection.If it cannot be directly classified as phishing,then it extracts the URL’s features,and uses the logical regressionclassifier for secondary detection to improve the detection adaptability and accuracy,and avoids false positives due tolack of rules.At the same time,RMLR introduces the Jaccard random domain name recognition method based on stringsimilarity to assist in detecting phishing URL.The experimental results show that the accuracy rate of the RMLR is98.7%,which means a good performance on phishing detection.