应用可见-近红外光谱技术进行定量分析时,变量选择起着十分重要的作用。不同土壤样品之间的预测机制可能存在很大差异,当待测样品出现新的特征信息时,基于建模集选择的特征变量不一定能够很好地代表待测样品的有效信息,继续采用原有特征变量建模就易导致预测误差增大。该研究采用递归变量选择方法在预测过程中递归更新土壤全氮与有机质的特征变量,以保持预测模型的鲁棒性;比较了偏最小二乘法(PLS),递归偏最小二乘法(RPLS)和不同递归变量选择方法,如:变量投影重要性与 RPLS 相结合(VIP-RPLS),VIP-RPLS,无信息变量消除法与PLS相结合(UVE-PLS)对于土壤全氮与有机质含量的预测效果。所用195份土壤样品来自浙江省文成县8个乡镇的农田。土壤样品随机分成两部分,一部分作为建模集包含120份样品,另一部分作为预测集包含75份样品。结果表明:VIP-RPLS建立的模型对于预测土壤全氮与有机质含量取得了最优的结果,获得的决定系数(R2)分别为0.85与0.86,获得的预测相对分析误差(RPD)分别为2.6%与2.7%。说明VIP-RPLS通过不断更新模型的特征变量,能够捕获新加入到建模集样品的有效信息。相比于本研究中的其他方法,VIP-RPLS对于土壤全氮与有机质含量具有更高的预测精度。
In the present work,recursive variable selection methods (updating both the model coefficients and effective variables during the prediction process)were applied to maintain the predictive abilities of calibration models.This work compared the performances of partial least squares (PLS),recursive PLS (RPLS)and three recursive variable selection methods,namely vari-able importance in the projection combined with RPLS (VIP-RPLS),VIP-PLS,and uninformative variable elimination combined with PLS (UVE-PLS)for the measurement of soil total nitrogen (TN)and organic matter (OM)using Vis-NIR spectroscopy. The dataset consisted of 195 soil samples collected from eight towns in Wencheng County,Zhejiang Province,China.The entire data set was split randomly into calibration set and prediction set.The calibration set was composed of 120 samples,while the prediction set included 75 samples.The best prediction results were obtained by the VIP-RPLS model.The coefficient of deter-mination (R2 )and residual prediction deviation (RPD)were respectively 0. 85,0. 86 and 2. 6%,2. 7% for soil TN and OM. The results indicate that VIP-RPLS is able to capture the effective information from the latest modeling sample by recursively up-dating the effective variables.The proposed method VIP-RPLS has the advantages of better performance for Vis-NIR prediction of soil N and OM compared with other methods in this work.