这份报纸为面对一个对手过滤掩盖罐头猪肉邮件避免的人,被检测的垃圾探讨大边缘分类的挑战。在实践,对手可以战略上增加一条合法消息指示的好词或移开垃圾的脏话陈述语气。我们假设对手能负担得起仅仅修改一条垃圾消息到某个程度,没有损坏它为 spammer 的实用程序。在这个假设下面,我们在场为垃圾的分类的一条大边缘途径发送消息那可以被掩盖。建议分类器作为秒顺序锥编程优化被提出。我们用 TREC 2006 罐头猪肉语料库执行了一组实验。当更多的词被对手注射或搬迁时,结果证明标准支持向量机器(SVM ) 的表演很快降级,当建议途径在伪装攻击下面是更稳定的时。
This paper addresses the challenge of large margin classification for spare filtering in the presence of an adversary who disguises the spam mails to avoid being detected. In practice, the adversary may strategically add good words indicative of a legitimate message or remove bad words indicative of spam. We assume that the adversary could afford to modify a spam message only to a certain extent, without damaging its utility for the spammer. Under this assumption, we present a large margin approach for classification of spare messages that may be disguised. The proposed classifier is formulated as a second-order cone programming optimization. We performed a group of experiments using the TREC 2006 Spam Corpus. Results showed that the performance of the standard support vector machine (SVM) degrades rapidly when more words are injected or removed by the adversary, while the proposed approach is more stable under the disguise attack.