旋转森林(rotation forest,RoF)是一种运用线性分析理论和决策树的集成分类算法,在分类器个数较少的情况下仍可以取得良好的结果,同时能保证集成分类的准确性。但对于部分基因数据集,存在线性不可分的情况,原始的算法分类效果不佳。提出了一种运用核主成分分析变换的旋转森林算法(rotation forest algorithm based on kernel principal component analysis,KPCA-RoF),选择高斯径向基核函数和主成分分析的方法对基因数据集进行非线性映射和差异性变化,着重于参数的选择问题,再利用决策树算法进行集成学习。实验证明,改进后的算法能很好地解决数据线性不可分的情形,同时也提高了基因数据集上的分类精度。
Rotation forest(RoF)algorithm is an ensemble classification algorithm using linear analysis theory and decision trees.The rotation forest achieves higher classification accuracy and superior performance with less number of classifiers.However,the classification accuracy decreases for gene expression data with linearly inseparable cases.To address this issue,this paper proposes a rotation forest algorithm based on kernel principal component analysis(KPCA-RoF),chooses the Gaussian kernel function and principal component analysis to implement the nonlinear mapping and deal with differences in gene data.The proposed algorithm focuses on the optimization of parameters,and uses decision tree algorithm for ensemble learning.Experiments show that the new algorithm well addresses the linearly inseparabal issue and improves the classification accuracy.