由于相对于漏报,误报会对邮件过滤性能造成更负面的影响,因此有必要研究如何让邮件过滤器对误报代价表现出更高的敏感性.本文通过引入具有偏依赖特征的权值系数函数,提出了一种能够实现非对称训练学习的改进拟合Logistic Regression邮件分类算法模型.根据在实际邮件样本集上所作测试试验,在分类精度性能没有降低的条件下,验证了新分类模型在误报率和漏报率两项指标之间存在较明显的偏依赖特性,同时对扰动特征数据表现出较强鲁棒特性.
Since false positive, compared with false negative, would cause much higher negative influence on email filter' s performance,it is necessary to investigate how to make the email filter become more sensitive to handle the cost of false positive. This paper brings forward an advanced fitting Logistic Regression model for spam discrimination by inn:educing a coefficientweighted function which can help to implement unbalanced classifier training. Without performance degradation on classification precision, the results of the performance evaluation on actual email testing sets verify that the new categorization model is of the partial dependent characteristic evidently between the criteria of false positive ratio and false negative ratio. Meanwhile, the testing results suggest that the model is robust to perturbing data as well.