复杂疾病目前正在全球范围流行,极大地影响人类的健康。研究发现,复杂疾病的性状受到多个位点的相互作用影响。目前的全基因组关联分析(Genome-wide association study,GWAS)仅仅解析单个SNP位点对疾病易感性的贡献,单纯依靠这一种策略并不能在寻找复杂疾病的病因上得到根本性的突破。基因-基因相互作用可能是复杂疾病致病的主要因素之一。针对这一点,科学家已经提出了一些检验基因相互作用的算法,包括惩罚logistic回归模型、多因子降维(Multifactor dimensional reduction)、集合关联法(Set-association approach)、贝叶斯网络(Bayesian networks)、随机森林法等。文章首先对目前这些方法做了综述,并指出了其中的不足,包括计算复杂度太高、假设驱动、数据会过度拟合、对低维数据不敏感等,进而简述了一种由笔者所在实验室开发的基于GPU的研究基因相互作用的算法,该算法复杂度低,不需要任何假设,没有边际效应,有很好的稳定性,速度快,适用于进行全基因组范围内的基因-基因相互作用计算。
Complex diseases have affected human’s health throughout the world.Hundreds of studies show that com-plex diseases are caused by multiple loci.Currently,genome-wide association studies(GWAS) only focus on the single lo-cus that contributes to the susceptibility of a certain disease.However,the interaction between genes could be one of the main factors that lead to complex traits.This fact has initiated scientists to propose some algorithms to detect these interac-tions,such as the penalized logistic regression model,multifactor dimensionality reduction method,set association analysis method,Bayesian networks analysis method and random forest.However,these algorithms are of high complexity,hy-pothesis-driven,causing over fitting of data,or not sensible of data at low dimensions.In this paper,we reviewed these algorithms,and then demonstrated a new algorithm based on GPU to provide a powerful strategy to analyze gene-gene interaction in genome-wide association datasets.This algorithm is of low computing complexity,free of hypothesis,not affected by single locus marginal effect,and also of high stability and speed.