针对高速数据流的流速超过集合分类器的处理能力,集合分类器无法训练全部最近到达的数据以更新分类器模型的问题,提出一种偏倚抽样集合分类器算法.通过偏差方差分解方法分析集合分类器的期望错误,利用计算待抽样数据的期望错误贡献度,实现数据的偏倚抽样,有效缩减了集合分类器的训练更新时间.与随机抽样集合分类器方法进行了比较.理论分析和实验结果表明,在抽样比例相同的条件下,该方法可以有效提高集合分类器的分类准确率.
High speed data stream brings the phenomenon that the data rate is higher relative to the ensemble classifiers' computational power,so the ensemble classifiers can't train all data which reached recently to update themselves. An ensemble classifiers is proposed based on biased sample. Expectation error is analyzed through biased variance decomposition method,and the data is also biased sampled by computing all data's expectation error contribution degree which is waited for being sampled. This method can reduce time to train and update ensemble classifiers and will be contrasted with random sample ensemble classifiers. It indicates that this method has more prediction accuracy on condition the same proportion of sample.