在软测量建模问题中为了提高模型的估计精度,通常需要将原始数据集分类,以构造多个子模型。数据分类中利用朴素贝叶斯分类器简单高效的优点,首先对连续的类变量进行类别范围划分,然后用概率论中的“3σ”规则对连续的属性变量离散。可以消除训练样本中干扰数据的影响,利用遗传算法从训练样本集中优选样本。对连续变量的离散和样本的优选作为对数据的预处理,预处理后的训练样本构建贝叶斯分类器。通过对UCI数据集和双酚A生产过程在线监测数据集的实验仿真,实验结果表明,遗传算法优选样本集的“3σ”规则朴素贝叶斯分类方法比其它方法有更高的分类精度。
Constructing sub -models can increase estimation accuracy in soft sensing modeling, and the construction of multi - model is based on the classification of the original data set. Among the methods of data classification, Naive Bayesian classifier has been widely applied because of its simplicity and efficiency. The continuous class variables are firstly divided into several categories, then the "3σ" rule based on probability theory is proposed to discretize the attributes. In order to eliminate the interferences from the training sample, the optimal sub sample set is selected from the training sample set by genetic algorithm. Finally the preprocessed training samples are used to build the Bayesian classifier. Experiments of both UCI data sets and the on - line monitoring data sets from the process of production for Bisphenol- A (BPA) production are carried out, The results show that it is possible to reliably improve the naive Bayesian classifier by using data discretization and selected as part of data pre - processing.