研究并设计了一个级联型邮件过滤系统,降低了图像型邮件错误分类的可能性。提出将垃圾图像做进一步分类的方案。首先利用图像的底层特征,使用支持向量机实现粗分类,识别出大多数正常邮件图像;然后基于词袋模型实现更精确地分类;最后提取垃圾邮件图像的文本区域信息,利用最近邻分类算法,比对敏感词库,将垃圾邮件细分为广告类、票据类、色情类和反动言论类等,实现有效地管理和监控图像型垃圾邮件。
A cascaded image spam system was designed which decreased probability of error classification. A scheme on classifying image spam was further proposed. Firstly, a coarse classification was realized with low-level feature and Support Vector Machine. Then, an accurate classification was realized based on bag-of-words model. Finally, the image spam was categorized into advertisements, false invoice, pornography and illegal comments etc. The system can manage and monitor image spam effectively.