提出一种用于核磁共振代谢组学数据预处理的自适应分段积分方法.通过计算各数据点统计特性,并根据相邻数据点的统计差特性进行自适应积分,克服了目前普遍采用的等间隔分段积分法可能存在的缺陷(如统计差异性相反的信号相互抵消、微弱特征信号被掩盖及谱图信噪比下降等),从而避免了对后续统计分析所产生的负面影响.为比较自适应分段积分和等间隔积分对数据预处理的效果,分别采用计算机模拟数据和饮食差异人群两种模型进行分析.研究结果表明,新方法能够有效地削弱噪声和非特异信号的影响,提高后续的主成分分析结果的可靠性,使代谢组学数据分析更具生物学意义.
A novel adaptive binning method was statistical discrepancy of each spectral data point is proposed for NMR metabonomic data preprocessing. The estimated, then the contiguous data points are integrated adaptively based on the statistical discrepancy. Comparing to the fixed width binning, the proposed method can overcome the following negative effects on the subsequently statistical analysis. For example, signals with opposite statistical discrepancies may be superposition in a same region. Both simulated NMR data and experimental spectra from dietary intervention individuals were employed to validate the performance of the adaptive binning. The results show that the proposed method effectively mitigates disturbance from spectral noises and signals without statistical significance. It can increase the interpretability of PCA loading results so that the metabonomics results are more biological significant.