差异性是分类器集成具有高泛化能力的必要条件.然而,目前对差异性度量、有效性及分类器优化集成都没有统一的分析和处理方法.针对上述问题,本文一方面从差异性度量方法、差异性度量有效性分析和相应的分类器优化集成技术三个角度,全面总结与分析了基于差异性的分类器集成.同时,本文还通过向量空间模型形象地论证了差异性度量的有效性.另一方面,本文针对多种典型的基于差异性的分类器集成技术(Bagging,boosting GA-based,quadratic pr0盯amming (QP)、semi—definite programming(SDP)、regularized selective ensemble(RSE))在UCI数据库和USPS数据库上进行了对比实验与性能分析,并对如何选择差异性度量方法和具体的优化集成技术给出了可行性建议.
Diversity is a necessary condition for high generalization capability in classifier ensemble. However, ther exists no uniform analysis and operation methods for diversity measure, effectiveness analysis or ensemble optimizatio. To solve these issues, on the one hand, classifier ensemble with diversity is comprehensively summarized and analyze, from three aspects, i.e., diversity measurement methods, effectiveness analysis for diversity measurement methods an, optimization techniques for classifier ensemble. Moreover, the effectiveness of diversity is also demonstrated by the vecto space model. On the other hand, comparative experiments and analysis have been performed on UCI data sets and USPS data set with a variety of typical classifier ensemble methods (Bagging, boosting, GA-based, quadratic programming (QP), semi-definite programming (SDP), regularized selective ensemble (RSE)). Finally, we give some suggestions on how to select diversity measurement methods and optimization techniques in ensemble.