构建系统发生树是研究物种起源和演化的重要手段.本文基于KEGG(Kyoto Encyclopedia of Genesand Genomes)代谢路径,引入图论的“核”概念,提出一种构建系统发生树的方法.首先解决在无数据丢失前提下,代谢路径数据的提取和表示问题,其次将不同代谢路径的相似度定义为图的核部分与非核部分各自匹配程度的加权之和,利用距离矩阵构建物种间的系统发生树.通过大量试验数据和NCBI(National Center for Biotechnology Information)分类法进行比较,验证了本文方法的有效性.
Constructing the phylogenetic tree of life is an important resort of learning the origin and the evolution among species. By introducing the concept of "kernel", a method to achieve phylogenetic tree based on KEGG metabolic pathway is presented. We firstly solved the problem of pathway abstraction with no metabolic information lost, and secondly we defined the similarity between different metabolic pathways as the summation of weighted matching score of the kernel subgraph and the non-kernel one respectively. Based on the distance matrix obtained by the two steps above, we construct the phylogenetic tree of several species. The experiments show that it is an efficient method according to the comparison between the trees obtained and NCBI taxonomy.