肺癌致病基因的发现及预测有助于认识肺癌的发生机理、诊断与防治,是人类基因组研究的重要目标。应用现有二元网络重启随机游走算法预测致病基因时,一般先在疾病表型网络、蛋白质作用网络及疾病-蛋白质二分图网络内随机游走一步,然后进行网络间跳转,这种策略不仅搜索效率较低,还可能遗漏蛋白质(或疾病)网络中的局部拓扑信息。鉴于此,作者提出一种二元网络异步重启游走(asynchronously random walk with restart,ARWRH)算法,构建疾病表型-蛋白质异构网络,深层次挖掘潜在肺癌风险致病基因。ARWRH算法首先在疾病表型网络、蛋白质作用网络及疾病表型-蛋白质二分图网络内随机游走不同步数,然后进行网络间跳转,迭代形成稳态概率向量,从而获得候选致病基因。仿真实验表明,ARWRH算法可有效预测肺癌潜在风险致病基因,多数预测结果获得了文献证据支持。
Predicting lung cancer genes can broaden the understanding of the cellular mechanisms that drive lung cancer, and guide for lung cancer diagnosis, prognosis and therapeutic intervention. It is also an important object of Human Genome Project. Generally, existing algorithms of random walk with restart for predicting the disease genes take the strategy of walking one step in the disease phenotype network, the PPI network and the disease phenotype-protein bipartite network, then jump across heterogeneous networks. This strategy will lead to lower search efficiency and higher probability of missing the local topology information hidden in protein interaction (or disease) networks. An improved algorithm of asynchronously random walk with restart in the heterogeneous networks, called as ARWRH, was proposed to mine lung cancer risk disease genes from disease phenotype-protein heterogeneous networks. ARWRH algorithm walks different steps in the disease phenotype network, the PPI network and the disease phenotype-protein bipartite network, then jumps across heterogeneous networks, in the end, forms a steady vector by this iteration to predict disease genes. The results show that ARWAH algorithm can effectively predict the potential risk lung cancer disease genes. Some evidences in the literatures support that most of the predicted genes are related with lung cancer.