鉴于基因芯片实验的造价,在基因芯片实验设计中,首要考虑的因素是需要多少重复才能检测出一个具有显著差异表达的基因。计算多重检验法要求的重复数(样本大小)或功效可为基因芯片实验设计提供重要的参考。为此,本文基于置换重抽样法构建了一种基因表达噪声混合分布模型。该方法适用各类基因表达数据,即无论是基因表达单噪声源或是多噪声源都可行。应用混合模型和多重检验法并给定统计功效。研究者能在基因芯片实验中获得所需要的最少生物学重复数:或者根据样本大小来确定测定一个显著差异表达的基因所具有的检验功效;或者根据样本大小和统计检验功效,选择最好的统计测验方法。本文以一组在老鼠中与中风有关的3000个基因的基因芯片实验所获得的数据为例,应用该方法拟和后组建了一个单分布模型(即表达单噪声源的分布模型)。根据该模型,我们计算了4种多重检验法在鉴定一个具有表达差异(D)值的基因中所需要的统计功效。结果表明。检测一个小的差异D值,4种多重检验法中B方法的统计功效最低,而BH方法最高。但是,对于鉴定一个具有最大表达差异的基因时,4种方法有相同的鉴定功效。与传统的单个检验法一样,BH方法检测一个小的变化所需要的效率不会随基因数目增加而改变,其他3种多重检验法的检测功效则随基因数目增加而降低。
Because of the high operation costs involved in microarray experiments, the determination of the number of replicates required to detect a gene significantly differentially expressed in a given multiple-testing procedure is of considerable significance. Calculation of power/replicate numbers required in multiple-testing procedures provides design guidance for microarray experiments. Based on this model and by choice of a multiple-testing procedure, expression noises based on permutation resampling can be considerably minimized. The method for mixture distribution model is suitable to various microarray data types obtained from single noise sources, or from multiple noise sources. By using the biological replicate number required in microarray experiments for a given power or by determining the power required to detect a gene significantly differentially expressed, given the sample size, or the best multiple-testing method can be chosen. As an example, a single-distribution model of t-statistic was fitted to an observed microarray dataset of 3 000 genes responsive to stroke in rat, and then used to calculate powers of four popular multiple-testing procedures to detect a gene of an expression change D. The results show that the B-procedure had the lowest power to detect a gene of small change among the multiple-testing procedures, whereas the BH-procedure had the highest power. However all multiple-testing procedures had the same power to identify a gene having the largest change. Similar to a single test, the power of the BH-procedure to detect a small change does not vary as the number of genes increases, but powers of the other three multiple-testing procedures decline as the number of genes increases.