近年来,基于质谱技术的高通量蛋白质组学研究发展迅速,利用串联质谱图谱鉴定蛋白质是其数据处理中一个基础而又重要的环节.由于不需要利用蛋白质序列数据库,从头测序方法能够分析新物种或者基因组未测序物种的串联质谱数据,具有数据库搜索方法不可替代的优势.简要介绍高通量串联质谱图谱从头测序问题及其研究现状.归纳出几种典型的计算策略并分析了各种策略的优缺点.总结常用的从头测序算法和软件,介绍算法评估的各种指标和常用评估数据集,概括各种算法的特点,展望未来研究可能的发展方向.
High-throughput mass spectrometry-based proteomics is developing rapidly in recent years.A key and essential issue in proteomics data processing is to identify proteins via tandem mass spectra.De novo peptide sequencing approach is database independent,which is a distinct advantage compared to database searching approach,so it can be used to analyze the data of new organisms or unsequenced organisms.De novo peptide sequencing problem is briefly described at first,and then the state-of-the-art of this problem is introduced from different aspects,which include the strategies with their advantages and disadvantages,frequently used algorithms and tools,criteria for algorithm assessment,and frequently used datasets for algorithm comparison.At last,the characteristics of some algorithms are summarized and some possible improvements of de novo peptide sequencing algorithm design are proposed.