生物学通路被广泛应用于基因功能学研究,但现有的生物学通路知识并不完善,仍需进一步扩充.生物信息学预测为通路扩充提供了一种有效且经济的途径.文章提出了一种融合蛋白质-蛋白质互作知识以及Gene Ontology(GO)数据库信息进行基因通路预测的新方法.首先选取目标基因在蛋白质-蛋白质互作层面上的邻居所在的Kyoto Encyclopedia of Genes and Genomes(KEGG)通路为候选通路,然后通过检验候选通路中的基因是否在与目标基因关联的GO节点富集来判断目标基因的通路归属.分别利用Human Protein Reference Database(HPRD)和Biological General Repository for Interaction Datasets(BioGRID)数据库中的蛋白质-蛋白质互作信息进行预测.结果表明,在两套数据中,随着互作邻居个数的增加,预测的平均准确率(在所有目标基因注释的通路中被成功预测的比例)及相对准确率(在至少有一个注释通路被成功预测的基因集中,所有注释通路均被预测正确的基因所占的比例)均呈现上升趋势.当互作邻居个数达到22时,预测的平均准确率分别达到96.2%(HPRD)和96.3%(BioGRID),而相对准确率分别为93.3%(HPRD)和84.1%(BioGRID).进一步利用新版数据库对旧版数据库中被更新的89个基因进行验证,至少有一个更新通路被预测正确的基因有50个,其中43个基因的更新通路被完全正确预测,相对准确率为86.0%.这些结果显示该方法是一种可靠且有效的通路扩充方法.
Biological pathways have been widely used in gene function studies; however,the current knowledge for biological pathways is per se incomplete and has to be further expanded.Bioinformatics prediction provides us a cheap but effective way for pathway expansion.Here,we proposed a novel method for biological pathway prediction,by intergrating prior knowledge of protein-protein interactions and Gene Ontology (GO) database.First,the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways to which the interacting neighbors of a targe gene (at the level of protein-protein interaction) belong were chosen as the candidate pathways.Then,the pathways to which the target gene belong were determined by testing whether the genes in the candidate pathways were enriched in the GO terms to which the target gene were annotated.The protein-protein interaction data obtained from the Human Protein Reference Database (HPRD) and Biological General Repository for Interaction Datasets (BioGRID) were respectively used to predict the pathway attribution(s) of the target gene.The results demanstrated that both the average accuracy (the ratio of the correctly predicted pathways to the totally pathways to which all the target genes were annotated) and the relative accuracy (of the genes with at least one annotated pathway being successful predicted,the percentage of the genes with all the annotated pathways being correctly predicted) for pathway predictions were increased with the number of the interacting neighbours.When the number of interacting neighbours reached 22,the average accuracy was 96.2% (HPRD) and 96.3% (BioGRID),respectively,and the relative accuracy was 93.3% (HPRD) and 84.1% (BioGRID),respectively.Further validation analysis of 89 genes whose pathway knowledge was updated in a new database release indicated that 50 genes were correctly predicted for at least one updated pathway,and 43 genes were accurately predicted for all the updated pathways,giving an estimate of the r