东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

一种基于RNA-Seq的基因组注解评估方法

ISSN号：0023-074X
期刊名称：科学通报
时间：2013.11
页码：3471-3482
分类：Q987[生物学—遗传学;生物学—人类学]
作者机构：[1]厦门大学信息科学与技术学院自动化系,厦门361000
相关基金：国家自然科学基金（61203282,61202144）和美国国立卫生研究院基金（NIH/NIMH5RC2MH090047-01）资助致谢感谢Illumina公司的Gary Shroth博士为我们提供了高质量的RNA-Seq测序数据.
相关项目：高维数据特征选择的稳定性研究

作者：王颖|刘麟|

关键词：基因组, 转录组, 注解数据库, RNA—Seq, 敏感性, 特异性, genome, transcriptome, annotation database, RNA-Seq, sensitivity, specificity

中文摘要：

新一代测序技术下RNA-Seq测序数据为解码真核生物的转录组带来了突破性的变革，其细致到碱基层面的高分辨率信息，使得仅采用RNA-Seq作为唯一数据源便可对现有的基因组进行注解。同样地，利用RNA-Seq信息也能验证现有的剪切位点、外显子乃至转录物的注解信息。因此本文提出利用RNA-Seq数据对现有的基因组注解数据库进行评估，基于RNA-Seq的配准信息提出在基因、转录物、外显子、剪切位点和碱基层面的特异性和敏感性度量指标，进而评估基因组注解数据库的完整性和精确性。基于该评估框架，通过来自人类16个组织的11亿条RNA-Seq读段（read）数据对5个代表性的人类基因组注解数据库进行评估，并基于评价结果构建人体综合准确注解数据库；此外，还对现有的恒河猴基因组注解数据库进行了评估，发现该数据库的完整性有很大欠缺，同时其注解的精确性与人类数据库的注解水平有较大的差距。基于该评估体系，可对各物种的基因组注解信息的完整性和精确性进行全面、快速和高效的评估及验证。

英文摘要：

RNA-Seq brings a breakthrough to decode eukaryotic transritptomes. With the high resolution to nucleotide level, RNA-Seq can be adopted as an only data resources to annotate a whole genome. Similarily, RNA-Seq should be able to validate the annotated splicing junction, exon and transcript sets. Therefore, this study proposed an evaluation scheme for the accuracy （specificity） and completeness （sensitivity） of genome annotation databases at gene/transcript/exon/splice-junction/nucleotide base levels with RNA-Seq datasets as only resources. The scheme was applied to assess 5 widely-used human genome annotation databases using 1.1 billion high-quality RNA-Seq reads from 16 human tissues. Accurate-annotated transcripts were collected from the 5 databases to build combined accurate-annotated transcripts databases for the 16 tissues and the whole human body. Furthermore, the assessment for current rhesus annotation database showed that it is far from complete, and not so accurate as Human＇s annotations. The RNA-Seq analysis pipeline was constructed to implement an express and efficient assessment of various organisms＇ genome annotations over the whole transcriptome. The implementing pipeline can be downloaded from http：//code.google.com/p/genome-annotation-assessment-pipeline/downloads/.

同期刊论文项目