采用高分辨电喷雾萃取电离质谱(EESI-MS)技术对肝衰竭患者和健康志愿者呼出气体样本进行快速检测,结合多块偏最小二乘分析(MB-PLS)方法,对多批次获取的呼出气体代谢数据进行统计建模分析,并与传统的PLS方法进行比较.结果表明,MB-PLS方法能有效消除批次差异对统计建模的影响.此外,利用MB-PLS模型变量VIP值对变量进行筛选,可降低数据的冗余,消除无关变量对模型的影响,从而有效提高了模型的性能.
In metabolomics studies,the number of samples should be enough to guarantee the reliability of data statistical analysis. The effective storage time of exhaled breath is short,and it is difficult to collect and detect a large number of breath samples in a short time. Combining multi batches of samples may obtain a large data,but usually there is a large variance between batches induced by ambient air varying. In this paper,the exhaled breath data of liver failure patients and healthy volunteers were obtained by high resolution extractive electrospray ionization mass spectrometry( EESI-MS) and then analyzed by multi-block partial least square( MB-PLS). The results were compared with traditional PLS method and showed its strength of removing the variance of batches for modeling. Moreover,we provided a variable selection strategy that based on variable importance in the projection( VIP) of MB-PLS to reduce the redundancy of data and eliminate the effect of non-information variables for modeling,and the performance of MB-PLS model had a great improvement.