由Handelsmanetal(1998)提出的宏基因组(metagenome)泛指特定环境样品(例如:人类和动物的肠道、母乳、土壤、湖泊、冰川和海洋等环境)中微生物群落所有物种的基因组。宏基因组技术起源于环境微生物学研究,而新一代高通量测序技术使其广泛应用成为可能。与基因组学研究相类似,目前宏基因组学发展的瓶颈在于如何高效分析高通量测序产生的海量数据,因此,相关的生物信息学分析方法和平台是宏基因组学研究的关键。该文介绍了目前宏基因组研究领域中主要的生物信息学软件及工具;鉴于目前宏基因组研究所采用的“全基因组测序”(wholegenomesequencing)和“扩增子测序”(ampliconsequencing)两大测序方法所获得的数据和相应分析方法有较大差异,文中分别对相应软件平台进行了介绍。
Metagenome, a term first dubbed by Handelsman in 1998 as "the genomes of the total microbiotafound in nature", refers to sequence data directly sampled from the environment (which may be any habitat in which microbes live, such as the guts of humans and animals, milk, soil, lakes, glaciers, and oceans). Metagenomic technologies originated from environmental microbiology studies and their wide application has been greatly facilitated by next-generation high throughput sequencing technologies. Like genomics studies, the bottle neck of metagenomic research is how to effectively and efficiently analyze the gigantic amount of metagenomic sequence data using the bioinformatics pipelines to obtain meaningful biological insights. In this article, we briefly review the state-of-the-art bioinformatics software tools in metagenomic research. Due to the differences between the metagenomic data obtained from whole genome sequencing (i.e., shotgun metagenomics) and amplicon sequencing (i.e., 16S-rRNA and gene-targeted metagenomics) methods, there are significant differences between the corresponding bioinformatics tools for these data; accordingly, we review the computational pipelines separately for these two types of data.