为了提高近红外光谱定量分析的预测精度和建模效率,提出了一种基于交互式自模型的混合物分析的波长优选方法,根据光谱各波长变量的纯度值和标准差值,选择含有用信息的波长变量,并引入相关权函数解决变量间共线性问题。通过依次迭代选择的变量建立定量校正模型,由交互验证均方根预测误差(RM—SECV)确定最佳波长变量个数。应用该波长变量优选方法对具有不同葡萄糖含量的两组(四成分葡萄糖水溶液实验和人体血浆实验)近红外光谱数据进行分析,两组数据中分别只选择了全部变量的0.3%建立定量校正模型,其验证集葡萄糖浓度的均方根预测误差(RMSEP)分别减少为669和15mg/L。与全谱范围及优选波段建立的定量校正模型比较,本方法能够通过波长变量优选最小化冗余信息、提高预测精度及建模效率。
In order to improve prediction accuracy and modeling efficiency for quantitative calibration in near infrared spectroscopy, a novel wavelength variable selection method based on SIMPLISMA (simple-to-use interactive self-modeling mixture analysis) was proposed. According to the value of purity and standard devia- tion, the wavelength with maximum information was selected. And then the correlation weight function was introduced to solve the colinearity between variables. By constructing quantitative calibration model with itera- tively selected wavelength variables, the root mean square error of cross validation (RMSECV) was utilized to determinate the optimal number of selected variables. Two experimental NIR spectral data, four components mixture solution and plasma, for glucose concentration analysis were utilized to evaluate the proposed variable selection method. Only 0.3% variables of all spectra data for these two experimental data were used to quanti- tative calibration, and the root mean square error of prediction (RMSEP) of validation set of glucose concen- tration was decreased to 66.9 mg/dL and 1.5 mg/dL respectively. Comparing with the quantitative calibration model constructed with full spectral region and informative spectral band, the proposed variable selection method can minimize redundant information and is helpful to yield a more efficient calibration with higher prediction accuracy.