对分类器之间的差异性进行了研究,提出了一种基于信息熵差异性度量的增量集成分类算法,将信息熵差异性度量方法融入到基分类器选择过程中,通过对训练数据集的基分类结果的信息熵差异度计算,采用循环迭代优化的选择方法,以熵差异性最优化为约束目标,动态调整基分类器个数,实现了分类准确稳定,减少了系统开销。通过实验比对,证明了算法在数据流处理时比其他算法具有更小的开销和较强的适应性。
The diversity between classifiers was studied and an incremental classification algorithm for data stream based on information entropy diversity measure was proposed, the method of information entropy diversity measure was integrated into the selection process of base classifiers, the information entropy diversity of base classifier which trained from training data was calculated, by means of cyclic iterative as optimization method and entropy diversity as optimization constrained goal, the numbers of base classifiers was dynamic adjusted that improved the classification accuracy and stability to reduce system costs. The experiments prove that the algorithm has less cost and strong adaptability compare with other data stream algorithm when processing data stream.