集成学习/选择性集成是当前机器学习领域的研究热点,但是大部分发表的相关数据都是基于未公开的个人实验,这种模式一方面由于大量的重复工作而降低了研究工作的效率,另一方面也对集成学习走向实用化造成负面影响.本文从减轻研究工作中实验部分工作量、提升实验的可重复性、减少不同实验的结论差异和推动选择性集成技术走向实用化的角度出发,阐述了设计一个选择性集成研究和开发平台所需要考虑的问题以及系统的结构组成,并以EPP(Ensemble Pruning Platform)为例介绍了利用C++语言实现一个选择性集成开发平台的方法和关键流程.
Ensemble and ensemble pruning are hot topics in machine learning community.However,in most of the related publications the experiments are based on private implementations,resulting in much wasted time due to the duplication of efforts between researchers.And the application of ensemble pruning is also affected because of complexity and differences in the various implementations.This paper presents the key issues in designs of a research and development platform for ensemble and ensemble pruning,which aims to reduce the work in developing and testing of the ensemble pruning methods,to alleviate the confusions in the conclusions of different experiments and to accelerate the application of these techniques to real world problems.Then taking EPP(Ensemble Pruning Platform) as example,this paper describes the details of implementing such a platform by the C++ programming language.