多个相关任务同时学习可能比各个任务单独学习具有更好的泛化能力,这是多任务学习(multitask learning)模式的出发点。受其启发,研究并开发了一种多层邮件过滤系统。首先为各用户建立基本分类器,利用EM算法估计出基本分类器之间的相关系数,最终得到该用户的邮件过滤判别函数。实验结果表明,该系统对中英文语料都是可靠和有效的,并在样例较少时就具有较好的过滤性能。该过滤系统的优劣最终还取决于相关系数先验概率的参数取值,以及所选择的基本分类器。
A group tasks learning simultaneously may have better generalization ability than learning individually,inspired by which,this paper proposed a multilevel spam filter based on correlation coefficient. The system used EM algorithm to estimates the correlation coefficient,by which the system got a spam discriminant. Experiments show that the system is reliable and effective for both Chinese and English corpus. It can get good performance even if given small sample set. The performance of the system ultimately depends on the parameters value of the priori of the correlation coefficient,and the choice of base classifiers.