在近红外光谱定量分析中,样品化学值测定的准确度是运用数学模型进行定量分析精确度的理论极限。但能够准确获取化学值的样品数量比较少,许多模型在建模时只考虑这部分样品数据,而不考虑大量的无化学值的样品数据。针对该问题,本文在LS-SVR的基础上,提出了可以同时利用有化学值(标签)和无化学值样品数据的半监督LS-SVR(S^2LS-SVR)模型。类似于LS-SVR,该模型也只需求解一个线性方程组。最后,以烤烟样品数据集为实验材料,建立了四种样品成分(总糖、还原糖、总氮和烟碱)的定虽分析模型。网种样品成分的预测值与实际值的平均误差分别为6.62%,7.56%,6.11%和8.20%,相关系数分别为0.9741,0.9733,0.9230和0.9486。经分析比较发现S2LFS-SVR模型优于PLS和LS-SVR,从而验证了S^2LS-SVR模型的可行性和有效性。
In near infrared spectral quantitative analysis, the precision of measured samples' chemical values is the theoretical limit of those of quantitative analysis with mathematical models. However, the number of samples that can obtain accurately their chemical values is few. Many models exclude the amount of samples without chemical values, and consider only these samples with chemical values when modeling sample compositions' contents. To address this problem, a semi-supervised LS-SVR (S^2LS-SVR) model is proposed on the basis of LS-SVR, which can utilize samples without chemical values as well as those with chemical values. Similar to the LS-SVR, to train this model is equivalent to solving a linear system. Finally, the samples of flue- cured tobacco were taken as experimental material, and corresponding quantitative analysis models were constructed for four sample compositions' content(total sugar, reducing sugar, total nitrogen and nicotine)with PLS regression, LS-SVR and S^2LSSVR. For the S^2LS-SVR model, the average relative errors between actual values and predicted ones for the four sample compo- sitions' contents are 6.62%, 7. 56%, 6.11% and 8. 20%, respectively, and the correlation coefficients are 0. 974 1, 0. 973 3, 0. 923 0 and 0. 948 6, respectively. Experimental results show the S^2Ls-SVR model outperforms the other two, which verifies the feasibility and efficiency of the S2 LS-SVR model.