直系同源(orthology)是指由于物种形成事件而享有共同祖先的基因之间的关系,直系同源基因之间通常具有相似的结构和生物学功能.由于基因组和转录组序列的快速积累,精确的识别直系同源基因有助于功能基因的注释,比较和进化基因组学研究.综述了现有的识别直系同源基因的主要方法,并列举了由此构建的数据库.这些方法可以归纳为三大类,第一类是基于序列相似性的方法,具有识别速度快以及灵敏度高等优点;第二类是基于构建系统发育树的方法,具有准确性高和信息量大等优点;第三类是将上述两种方法结合起来的混合方法,更好地平衡了灵敏性和准确性.最后总结了识别过程所面临的问题.
Orthologous genes are those derived from a common ancestor through speciation, and typically re- tain similar architecture and biological function. Because of rapid accumulation of genomic and transcrip- tomic sequence, automated identification of orthology can facilitate functional annotation, and studies on comparative and evolutionary genomics. The main methods of orthologs prediction and corresponding databases constructed with these methods were briefly reviewed here. These methods can be grouped into three kinds, the first is similarity-based method, it has high sensibility and fast speed; the second is tree- based method, it is precise and informative; the third is hybrid method, it is the optimal trade-off between precision and sensibility. Finally the problems faced by the recognition process were summarized.