DNA甲基化芯片已广泛应用于癌症研究。但是有研究表明批次效应对基于高通量数据的研究有很大影响。癌症基因组计划(TCGA)数据库包含大量的不同批次的高通量甲基化数据。通过分析TCGA中7种癌症的数据,发现批次效应在各种类型的癌症数据中都广泛存在,可能会导致错误的生物学分析结论。最后,建议用一个简单的方法来避免批次效应。
Genome-wide methylation microarrays were widely used in cancer research. However, recent research suggested batch effects in high throughput data were often overlooked and lead to incorrect conclusions. The Cancer Genome Atlas (TCGA) database contained many methylation array datasets which were preformed on different batches. Here, we analyzed datasets from 7 cancer types in TCGA database. We found batch effects were widespread for each cancer. Then we showed ignoring the batch effects would lead to incorrect biological conclusions. At last, we suggested a simple choice to avoid batch effects.