利用单一分类器构造的缺陷预测模型已经遇到了性能瓶颈,而集成分类器相比单一分类器往往具有显著的性能优势。以构造高效的集成缺陷预测模型为出发点,比较了七种不同类型集成分类器的算法和特点。在14个基准数据集上的实验显示,部分集成预测模型的性能优于基于朴素贝叶斯的单一预测模型。其中,基于投票的集成分类框架具有最优的预测性能以及统计学意义上的性能优势显著性,随机森林算法次之。Stacking集成框架也具有较强的泛化能力。
Software defect prediction using classification algorithms was advocated by many researchers.However,several new literatures show the performance bottleneck by applying a single classifier recent years.On the other hand,classifiers ensemble can effectively improve classification performance than a single classifier.This paper conducted a comparative study of various ensemble methods with perspective of taxonomy.A series of benchmarking experiments on public-domain datasets MDP show that applying classifiers ensemble methods to predict defect could achieve better performance than using a single classifier.Specially,in all seven ensemble methods evolved by this experiments,voting and random forest have obvious performance superiority than others,and Stacking also has better generalization ability.