使用转录组测序(RNA-Seq)数据识别黑猩猩RNA编辑位点,探索了RNA编辑的识别机制以及潜在的功能影响.基于黑猩猩RNA-Seq数据与基因组序列的比对信息发现RNA-DNA错配位点,并构建编辑位点候选集.从中滤除基因组或转录组测序质量低的位点,其他的过滤条件包括3′端测不准、覆盖度、SNP位点以及估算的编辑水平.构建二项分布统计模型和Bonferroni多重检验滤除候选集中的随机错误,得到RNA编辑位点.选取落在已知基因上的编辑位点进行功能分析,并用Two Sample Logo软件分析编辑位点上下游序列的特征.识别出黑猩猩12种碱基替换型RNA编辑位点8 334个,其中有41个编辑位点改变原有的氨基酸,另有3个编辑位点落在microRNA(miRNA)潜在靶基因的种子结合区.统计学分析表明,分别有640和872个RNA编辑位点存在组织和性别差异.上下游碱基频率分析表明,多种类型的编辑位点紧邻碱基具有显著偏好.结果显示,RNA编辑在黑猩猩体内大量存在,且潜在具有重要的生物学功能,为进一步深入研究灵长类RNA编辑的机制奠定了基础.
RNA editing is a widespread post-transcriptional modification mechanism that alters genetic information at the RNA level by nucleotide insertions,deletions or substitutions,which can contribute to the diversification of the transcriptome and proteome.Although tens of thousands of A-to-I RNA editing events have been found in humans,there is limited knowledge of RNA editing in other nonhuman primates.For exploring the mechanism as well as potential functions of the RNA editing events in chimpanzee,we identified RNA editing sites based on chimpanzee RNA-Seq data here.By aligning between RNA-Seq data and chimpanzee genome sequences with TopHat software,all RNA-DNA mismatch sites were regarded as a candidate set.Low quality sites were filtered out by using both genome and transcriptome sequencing quality scores.The other filters containing uncertainty of sequencing at 3'-terminial positions,read coverage,SNP sites and estimated editing level were also applied on the candidate set.Statistical tests based on the Binomial distribution and Bonferroni multiple testing correction were performed on each candidate site to remove random errors between genome and transcriptome.Then,we detected tissue-and sex-specific RNA editing sites using bioinformatics approaches based on the Fisher's exact test and the Bonferroni multiple testing correction.The Two Sample Logo software was used to analyze the feature of the sequences surrounding the RNA editing site.A total of 8 334 RNA editing sites were identified in chimpanzee transcriptome and all 12 possible categories of discordances were observed.The top four distributions were A-to-G,U-to-C,G-to-A and C-to-U editing sites,which contained 1 995,1 452,1 293 and 1 101 sites,respectively.Forty-one editing sites alter amino acid residues,one of them creates a new stop codon which may shorten the KRT31 protein and affect its activity.Three editing sites damage the binding of microRNA potentially.Six hundred and forty and eight hundred and seventy-two RNA editing sites were identified