通过寻找共突变基因对,可以研究在癌症的发生与发展过程中被共同扰动的生物学功能,为揭示癌症的发生机制提供新的线索。目前,此类研究主要利用京都基因与基因组百科全书数据库(Kyoto Encyclopedia of Genes and Genomes,KEGG)。由于KEGG数据库倾向于定义粗泛的通路,因此,利用该数据库无法判定是通路整体还是其中的一部分与癌症相关。相反,Gene Ontology数据库在从宽泛到细致的不同层面上定义生物学功能,因此,基于GeneOntology功能类来研究癌症过程中生物学功能的共扰动是一种合理的选择。本文提出了一种算法,寻找Gene Ontology功能类间注释了非随机多的共突变基因对的功能对。由于GeneOntology功能类之间的依赖关系,导致找到的功能对之间存在冗余关系,本文提出了去冗余算法,以寻找非冗余的典型功能对。根据肺腺癌基因组体细胞突变扫查数据,我们找到了78对典型的共突变功能对。这些功能对包含宽泛和细致的生物学功能,更精确地定义了被共同扰动的生物学功能的范围,为研究肺腺癌的发生机制提供了新的线索。
Using gene pairs co-mutated in cancer genome,we can study the co-disruption of functions in carcinogenesis to provide new insight for understanding the molecular mechanism of cancer.Because Kyoto Encyclopedia of Genes and Genomes(KEGG) usually defines general pathways,it would be hard if not impossible to find whether only a part of or the entire pathway is likely to be disturbed in cancer.In contrast,Gene Ontology defines functions at various specific levels in a hierarchical manner.As such,it is reasonable to study the co-disrupted biological functions in carcinogenesis based on Gene Ontology.In this paper,we developed an algorithm to find pairs of Gene Ontology terms significantly overrepresented with between-term co-mutated gene pairs.Because the identified term pairs tend to be redundant resulting from the dependencies of Gene Ontology terms,we proposed an algorithm to identify non-redundant term pairs.Based on a somatic mutational screening dataset for lung adenocarcinoma,we found 78 typical pairs of Gene Ontology terms.These functional pairs include both general and specific functions,which can define the range of co-disrupted biological functions and provide new insight for understanding the mechanism of lung adenocarcinoma.