在基因芯片实验中,基因表达水平之间的相关性在推断基因间相互关系时起到非常重要的作用.未经标准化处理的芯片数据基因之间往往都呈现出很强的相关性,这些高相关性一部分是由基因表达水平变化引起的,而另外一部分是由系统偏差引起的.对芯片数据进行标准化处理的目的之一是消除系统偏差引起的高相关性,同时保留由真正生物学原因引起的基因表达水平高相关性.虽然目前对标准化方法已经有了不少比较研究,但还较少有人研究标准化方法对基因之间相关系数的影响,以及哪种方法最有利于恢复基因之间的相关性结构.通过对基因表达水平数据的模拟,具体比较了几种常用标准化方法的效果,从而给出最有利于恢复基因之间相关性结构的那种标准化方法.
Correlation coefficient between the expression levels of two genes plays an important role in the inference of their relationship in microarray experiments. Gene expression data before normalization often present high correlation coefficients among a large proportion of genes. Some of these high correlations are caused by changes in gene expression levels. However, most of them are caused by systematic errors. It is intended to eliminate superficial high correlations induced by systematic errors and at the same time, preserve high correlation coefficients stem from gene interactions. Although there are a number of comparisons among different normalization methods, less work focused on evaluating the effect of normalization procedures on correlation coefficients among genes and which method does the best in restoring gene correlation structure. Some gene expression data were simulated with reference to real world gene expression data. With the help of these simulated data, it was determined which normalization method does the best in restoring gene correlation structure. In addition, it was shown that the simulated data and the real world data have the same gene correlation structure, so the conclusion drawn from simulated data can be applied to the real world. For 5 normalization methods compared here, it can be concluded that the loess method is the most appropriate one in eliminating superficial correlation coefficients.