网络钓鱼(Web phishing)以相似网站欺诈用户、骗取个人机密信息,已成为电子金融活动的重大威胁.对此,文中提出了一个钓鱼网页检测架构.在具体检测机制方面,提出了一个基于嵌套EMD(Nested Earth Mover's Distance)的网页相似度判定算法,对Web图像进行分割,抽取子图特征并构建网页的ARG(Attributed Relational Graph),在计算不同ARG属性距离的基础上,采用嵌套EMD方法计算网页的相似度,实现了对钓鱼网站的检测.实验结果表明,与国际现有研究成果相比,该算法具有较高的精度和较强的适应性.
Web Phishing has become a big threat to online applications such as financial services, it steals user identities and credentials by imitating the sites of service providers such as banks. This paper proposes a novel architecture of Phishing Web detection which gives the function modules and processing workflow, and a visual based Web page similarity detecting algorithm. Based on the image of the suspicious Web page, the algorithm first divides Web page into sub-block images from which features and relations are abstracted and the ARG (Attributed Relational Graph) of the Web page is formed. Then based on the ARG of two Web pages, we get the Nested-EMD (Earth Mover's Distance) of the two pages as their similarity, and then the decision can be coneluded by comparing the similarity degree between two Web pages. The algorithm is implemented and compared with the latest international researches, and it is shown that the algorithm is better in accuracy and robustness according to the experiment.