分类器集成学习是当前机器学习研究领域的热点之一.然而,经典的采用完全随机的方法,对高维数据而言,难以保证子分类器的性能.为此,文中提出一种基于局部随机子空间的分类集成算法,该算法首先采用特征选择方法得到一个有效的特征序列,进而将特征序列划分为几个区段并依据在各区段的采样比例进行随机采样,以此来改进子分类器性能和子分类器的多样性.在5个UCI数据集和5个基因数据集上进行实验,实验结果表明,文中方法优于单个分类器的分类性能,且在多数情况下优于经典的分类集成方法.
Classifier ensemble learning is one of the present research focuses in machine learning field. However, the classical method of completely random subspace selecting can not guarantee good performances of sub-classifiers for high dimension datasets. Therefore, a classifier ensemble algorithm based on local random subspace is proposed. The features are ranked by employing feature selection strategy firstly, and then the ranked feature list is partitioned into a few parts and the randomly feature is selected in each part according to the given sampling rate. Thus, the performances of sub-classifiers and their diversities are improved. Experiments are carried out on 5 UCI datasets and 5 gene datasets. The experimental results show that the proposed algorithm is superior to a single classifier, and in most cases it is better than those classical classifier ensemble methods.