为了学习集成函数,提高分类性能,提出了两阶段集成学习方法(two-phases ensemble learning,简称为TPEL).结合垃圾邮件过滤一个2类文本分类问题,在4个公用数据集上对TPEL进行了一系列实验.实验结果表明,TPEL受集成的个体分类器个数的影响甚微;利用TPEL集成异构的多个分类器时效果显著;利用TPEL集成多个同构分类器时,绝大部分情况下取得了优于朴素贝叶斯等算法的结果,对稳定或不稳定学习器的集成效果都很好;TPEL的时间复杂度较低.
In order to learn ensembled function and improve classification performance, a new algorithm framework named TPEL (Two-Phases Ensemble Learning) is proposed. For the task of email filtering, a typical problem of two-class categorization, we conduct a series of experiments on four public available datasets. The experimental results show that firstly the performance of TPEL is faintly affected by the count of the combined classifiers. Secondly, TPEL bears the best capacity when it combines multiple heterogeneous classifiers. Thirdly, in most of the experiments, the performance of TPEL is better than that of the comparing algorithms such as Na ive Bayes, Bagging, Boosting etc. In addition, TPEL reveals its promising results in the situation of that either the weak learner is steady or not. At last, TPEL is provided with reasonable time complexity.