随着高通量DNA测序技术的飞速发展,越来越多的物种完成了基因组测序.定位编码基因、确定编码基因结构是基因组注释的基本任务,然而以往的基因组注释方法主要依赖于DNA及RNA序列信息.为了更加精确地解读完成测序的基因组,我们需要整合多种类型的组学数据进行基因组注释.近年来,基于串联质谱技术的蛋白质组学已经发展成熟,实现了对蛋白质组的高覆盖,使得利用串联质谱数据进行基因组注释成为可能.串联质谱数据一方面可以对已注释的基因进行表达验证,另一方面还可以校正原注释基因,进而发现新基因,实现对基因组序列的重新注释.这正是当前进展较快的蛋白质基因组学的研究内容.利用该方法系统地注释已完成测序的基因组已成为解读基因组的一个重要补充.本文综述了蛋白质基因组学的主要研究内容和研究方法,并展望了该研究方向未来的发展.
With the rapid development of high throughput DNA sequencing,genomes of more and more species have been sequenced.Identifying and determining the structures of coding genes are the basic tasks of genome annotation.To understand the sequenced genome precisely,it is necessary to integrate muilti-"omics" data to annotate genomes.However,the annotation methods developed in the past decade are mainly based on genome and transcriptome data.Recently,mass spectrometry based proteomics has come of age,which can cover proteomes nearly completely and make it possible to use mass spectrometry data to annotate genomes.Mass spectrometry data can verify annotated genes on one hand,and refine annotated genes,discover novel genes on the other,which achieve the goal of re-annotating genomes.This is exactly the research content of proteogenomics,and using proteogenomic techniques to systematically annotate the sequenced genome is becoming increasingly important.This article reviews the research content,methods and recent trends of proteogenomics.