针对传统最小二乘支持向量机非稀疏化解问题,提出了基于遗传算法的最小二乘支持向量机稀疏化及参数优化方法,稀疏化的基本思想是给训练样本赋予一个概率值,将概率值小于0.5的样本作为测试样本,从而将总的训练样本集分成测试样本集和保留的训练样本集。定义了包括稀疏率、训练误差及测试误差在内的适应度函数。种群个体的前N维表示每个样本对应的概率,后m维表示要优化的参数。通过选择、交叉和变异操作对所有参数进行整体优化,取适应度最小的个体对应的保留的训练样本及优化参数建立最小二乘支持向量机模型。并用该方法用于PX氧化过程4-CBA含量的软测量中,工业数据仿真结果表明,用本文提出的方法稀疏化率达到87%,核参数选取自动完成,与稀疏前建立的模型相比推广能力更高。
The traditional least squares support vector machine(LSSVM) is generally used to solve non-sparse problems. A sparse and parameter optimization method of LSSVM based on genetic algorithm was proposed. The basic idea of sparse was to give a probability value to each training sample, and if its probability value was less than 0.5 then the corresponding training sample was not a support vector. Samples that was not support vectors were treated as test samples. So, the set of total training samples was divided into the set of test samples and the set of training sample remained. A fitness function including sparse rate, training error and test error was defined. The first N dimensions of the population individual specified corresponding probability of each sample, the next m dimensions specified parameters to be optimized. All parameters including probabilities were optimized globally by mutation, selection, and crossover operations. A model of LSSVM was established by using the corresponding training sample remained and optimized parameters of the individuals with minimum fitness. The proposed method was applied to the soft sensor of 4-CBA concentration in the PX oxidation process. Simulation results with industrial data showed that by using the proposed method sparse rate was up to 87%, kernel parameters were identified automatically, and the sparse model had better generalization capability than that of the model before sparse.