生物和医疗大数据的快速大量积累是当今生命科学领域的一个重要特征,但从这些大数据能否获得关于生命现象规律的重大知识发现,是人们更关心的关键问题,也是2005年Science杂志展望的125个最具挑战性问题之一.本文从新一代DNA测序技术发展以及医学遗传学、合成生物学、精准医学、微生物组学等几个方面,回顾了近十年来生物大数据的重要发展和已经由此带来的科学进步,并对未来的发展方向进行了展望.
The quick accumulation of biological and medical big data is an important characteristic of current life sciences. Typical representatives of such big data are omics data obtained with the so-called next-generation sequencing or NGS technology, including genomic data, epigenomic data, metagenomic data, transcriptomics data, etc. The amount of data is overwhelming, but our key interest is whether such big data can lead to big discovery on the rules or mechanisms behind life. The question "how will big pictures emerge from a sea of biological data" was asked as one of the 125 most challenging questions by the Science magazine in 2005. Now more than ten years have passed. This article provides a brief review and perspective on the development of next-generation DNA sequencing technology in the past decade, as well as its applications and impacts on the fields of medical genetics, synthetic biology, precision medicine and microbiome studies. The big biological data have already brought new discoveries in biology although yet more will be expected and such discoveries have already shown promising applications in medical practices. The wide availability of DNA sequencing especially its rapidly decreasing cost is revolutionizing the study of genetic diseases. Large-scale genome-wide association studies(GWAS) and exome-sequencing studies have discovered many genes that are associated with many types of diseases. Some have already began to be used in clinical diagnosis. Accompanying the development of technologies for reading genomes, the genome editing technology shows the other side of the story: editing and writing genomes. Such technologies are enabling the systematic understanding of complex biological systems, and point to promising new approaches for disease treatment and prevention. Cancer studies are a major field that benefited the most from omics data. People are gaining more and more understanding in the development of cancer, and new taxonomy of cancer types based on genomic and epigenomic data brin