本文提出了一种基于混合高斯模型(GMM)的多贝叶斯过滤器融合方法,并成功应用于电子邮件过滤,该方法使用多元统计分析方法对多个过滤器在训练例上的过滤表现矩阵进行降维和除噪,得到训练数据及各过滤器的分布;然后,从这一分布中学习出对邮件进行类别判定的GMM.GMM根据期望代价最小准则进行过滤,避免将正常邮件判定为垃圾,实验结果表明,本文方法具有较好的过滤性能,且对于特征提取率的敏感度低。
An algorithm combining multiple Naive Bayesian (NB) filters based on GMM is presented, which has been successfully applied to e-mail filtering. The method uses the multiple variates statistics analysis to model the relationship between the training data set and their classification by a collection of NB filters. Then a GMM can be learned from the resulting representation. The GMM filters previously unseen e-mails according to the principle of minimizing expected-error-cost, in order to avoid deleting useful e-mails. Experimental results confirm the validity of our method, and show that our approach is insensitive to ratio of feature subset selection.