将极大线性无关组的概念及方法引入近红外光谱分析,探讨了在建立定量分析模型时代表性样品,即校正集样品的选择问题。以2652个烟末样品为实验材料,随机选取1001个样品构成预测集,其余1651个样品为代表性样品备选集。用Matlab软件求出代表性样品备选集光谱矩阵的极大线性无关组,以此作为代表性样品,构成建模的校正集。用PLS回归法建立了烟末样品总糖含量定量分析的预测模型,并将模型用于预测集中1001个烟末样品总糖含量的预测分析。实验结果表明,当选择的校正集包含的样品数量大于32时,所建各模型对预测集样品预测的平均相对误差均小于4%,平均相关系数大于0.96。其中选择32个代表性样品和146个代表性样品所建模型定量分析预测集中各样品的总糖含量,两个结果经统计检验没有显著性差异(α=0.05),说明求极大线性无关组的方法用于校正集样品的选择,可实现“少而精”选择样品的目的。此外,我们用求极大线性无关组选择校正集样品和随机方法选择校正集样品两种方法,选择了同样数目28,32,41,76,146,163个样品建模进行预测效果的对比实验,结果显示,求极大线性无关组法选择校正集建模的预测效果优于随机选择校正集建模的预测效果。
In the present paper, a simple but novel method based on maximum linearly independent group was introduced into near-infrared (NIR) spectral analysis for selecting representative calibration samples. The experiment materials contained 2 652 tobacco powder samples, with 1 001 samples randomly selected as prediction set, and the others as representative sample candidate set from which calibration sample set was selected. The method of locating maximum linearly independent vectors was used to select representative samples from the spectral vectors of representative samples candidate set. The arithmetic was accomplished by function rref(X, q) in Matlab. The maximum linearly independent spectral vectors were treated as calibration samples set. When different calculating precision q was given, different amount of representative samples were acquired. The selected calibration sample set was used to build regression model to predict the total sugar of tobacco powder samples by PLS. The model was used to analyze 1001 samples in the prediction set. When selecting 32 representative samples, the model presented a good predictive veracity, whose predictive mean relative error was 3. 621 0%, and correlation coefficient was 0. 964 3. By paired- samples t-test, we found that the difference between the predicting result of model obtained by 32 samples and that obtained by 146 samples was not significant (a=0. 05). Also, we compared the methods of randomly selecting calibration samples and maximum linearly independent selection by their predicting effects of models. In the experiment, correspondingly, six calibration sample sets were selected, one of which included 28 samples, while the others included 32, 41, 76, 146 and 163 samples respectively. The method of maximum linearly independent selecting samples turned out to be obviously better than that of randomly selecting. The result indicated that the proposed method can not only effectively enhance the cost-effectiveness of NIR spectral analysis by reducing the numbe