相关疾病基因的发现和预测有助于认识疾病发生机理及该疾病的诊断与防治,是人类基因组研究的重要目标。临床表现重叠的疾病经常由同一功能模块中的一个或多个基因变异引起,且导致疾病表型相似的基因间经常发生直接或间接相互作用,也就是致病基因具有网络模块性。鉴于此,基于k近邻思想扩展异构网络游走RWRH算法中的初始游走概率向量,作者提出一种改进的异构网络随机游走KRWRH算法,在基因.表型异构网络中深层次挖掘潜在风险致病基因。KRWRH算法通过扩展种子集合构建起始概率向量,种子集合包含已知致病基因及其k近邻基因;然后在异构网络中随机游走,通过迭代形成稳态概率向量,从而获得候选致病基因。通过对孟德尔遗传在线数据库中的18种遗传疾病进行仿真验证,说明KRWRH算法可有效预测潜在风险致病基因。
Identification and prediction of disease-associated genes can broaden understanding of the cellular mechanisms that drive human disease, and guide for disease diagnosis, prognosis and therapeutic intervention. It is also an important object of Human Genome Project. Generally speaking, the diseases with similar clinical manifestations are often caused by a series of genes, and these disease genes have the modularity in protein-protein interaction network. Based on the disease gene modularity and k-nearest neighbor (KNN) method, one improved algorithm of random walk with restart in the heterogeneous network, named as KRWRH, was proposed in this paper. KRWRH constructs the initial probability vector by enlarging the seed set which contains the known disease genes and their k-nearest neighbor genes, then walks in the heterogeneous network. Through iterating, the steady-state probability vectors are formed. Thus, the candidate disease genes can be obtained. The simulation results on the 18 diseases collected from the Online Mendelian Inheritance in Man (OMIM) show that KRWRH outperforms existing methods in prioritizing risk disease genes with higher sensitivity.