全基因组关联研究(Genome.WideAssociation Studies,GWAS)可以直接研究人类行为能力和基因型间的关联,为心理学研究者从全基因组层次探索人类行为能力的遗传基础提供了新的手段。GWAS中涉及大量住点和行为的关联检验,所以必须采用多重校正来控制整体虚报。尽管存在多种校正方法可供选择,但GWAS研究中不同校正方法的适用性,目前尚缺少系统研究,使得GWAS中多重校正方法的选择缺少理论和经验依据。GWAS中常用的校正方法有基于族错误率(Family—Wise Error Rate,FWER)标准的Bonferroni校正法,Holm递减调整法,排列检验法和基于错误发现率(False Discovery Rate,FDR)标准的BH法。对这4种多重校正方法的原理和流程进行了详细阐述;提出了一种GWAS数据仿真方法,并基于仿真数据对不同多重校正方法进行了定量比较。结果显示,前3种基于FWER的方法差别很小,它们对虚报的控制最为严格,但是检测出的真实关联的位点数却显著低于基于FDR的BH法。独立数据上,BH法所报告的SNPs对行为具有最高的解释率,即相对于其它方法,BH方法更好的平衡了虚报和击中。未来研究中可以考虑用BH法来对结果进行校正。
Genome-Wide Association Studies (GWAS) can reveal the genetic basis of the behavior. However, the association analysis embodies a massive multiple testing problem, where millions of SNPs (Single Nucleotide Polymorphisms) are tested. It is vital to reduce the risk of false positive in multiple testing with an appropriate corrections method. Firstly, Family-Wise Error Rate (FWER) and False Discovery Rate (FDR), the two standard measures of Type I errors in multiple testing were introduced. Secondly, three FWER (i.e., Bonferroni, Holm Step-Down and Permutation) and one FDR (i.e., BH) multiple testing corrections method were discussed from the concept to implementation. Finally, a method to simulate GWAS data was proposed, and the four multiple testing corrections methods were evaluated on the simulated GWAS data. Results showed that SNPs reported without multiple testing corrections had both the highest average hit and the average false alarm. FWER methods reported fewer false alarms, but their average hits were also fewer than that from uncorrected or BH method. In contrast, BH method did well in balance between the false alarm and hit. Furthermore, a comprehensive index, called explained rate, was introduced to evaluate the different methods quantitatively. Results showed BH method had the highest explained rate. In the future GWAS study, researchers would better do multiple testing corrections with BH method.