目的:研究在基因芯片数据分析中自限性原假设和竞争性原假设两类方法的优劣性和准确型,选取各自具有代表性的GAGE(Generally Applicable Gene-set Enrichment)和GSEA(Gene Set Enrichment Analysis)两种基因集分析方法筛选富集基因集的效能,并探讨其筛选效果。方法:采用两种待比较的方法在实际基因表达谱数据中分析研究,比较筛选结果的准确性和科学性,探讨两种方法筛选富集基因集的效果。结果:两方法对已知的基因表达谱数据进行应用分析表明GAGE的检验效能和筛选出的基因集生物学相关性均优于GSEA。结论:GAGE作为一种自限性原假设的基因集分析方法,由于其充分利用了表达谱数据,并将表达数据分为实验集和通路集分别进行分析处理,同时考虑到基因集的上调和下调,其检验效能优于竞争性原假设的GSEA,能够得到更为准确和科学的结果。
Objective: To compare the efficiency of GAGE which is a new method of the "self-contained null hypothesis" with GSEA which is a popular utilized method of the "competitive null hypothesis". Methods: The two methods were used to analyze two different real microarray datasets which contained three different scenarios simultaneously. The results were compared in the following two aspects:(1) sensitivity and specificity of the gene-sets inference; (2) biological relevance of the gene-sets identified. Results: In the first dataset application, GAGE identified 6/12, 4/10 differential expressed sets significantly respectively, GSEA was 3/12, 2/10. The second dataset application showed the gene-sets identified by GAGE are much more biological relevant than which identified by GSEA. Conclusion: GAGE is a new "self-contained null hypothesis" method, which separated gene sets into pathway and experimentally derived gene sets and analyzed them respectively. Furthermore, fold change (log based) was taken into account. As application in real datasets, GAGE consistently outperformed another frequently used GSA method GSEA, got much higher power.