宏基因组学研究试图通过测序并分析微生物群落的DNA序列,以理解环境微生物的组成及其与环境的交互作用。宏基因组学革命性地改变了微生物学,使得以免培养的方式研究复杂生物系统中的微生物群落成为可能。第二代测序技术的不断进步和生物信息学的高速发展促进了高通量宏基因组研究的发展,大批高质量的宏基因组数据不断产生并对科学界开放,宏基因组学的重要作用被科学界广泛认可。与此同时,对应个体不同健康状态和人体不同部位的大量宏基因组样本数据不断产生,使得比较和分类宏基因组样本在微生物学研究上变得更加重要,比较宏基因组学成为宏基因组学的重要分支。主要介绍了宏基因组数据的分析比较,以及样本分类的相关研究和算法。
Metagenomics attempts to understand the diversity of the environmental microbial community and the interaction between microorganisms and environment by analyzing the sequence data of metagenomic samples. Microbiology has been revolutionized by metagenomics,which makes it feasible to research the microbial communities in complex biological systems without cultivating the microbes. The high-throughput metagenomic study is promoted by the rapid development of next-generation sequencing technology and bioinformatics. As a mass of high-quality metagenomic sequencing data are produced,also are accessible to the scientific community,the role of metagenomics has been recognized by various scientific areas. On the other sides,huge metagenomic data for individuals with different health status,or for different habitats of the human body makes the comparison and classification of metagenomic samples more important,leading the comparative metagenomics to become an important branch of metagenomics. This review mainly introduces the related researches and algorithms in the analysis,comparison and classification of metagenomic sequencing data.