高通量转录组测序技术已经发展成为分析不同细胞中选择性剪接事件的最有效方法,其测序数据处理的第一步是将数以百万的测序片段准确地比对到参考序列上,称之为转录组序列比对.现有的比对工具基本上都是依赖于经典的剪接位点信号,一定程度上限制了转录组测序技术发现全新剪接位点的能力.为此,我们设计了一种不依赖于剪接位点信号的转录组序列比对方法 RNAMap,该方法按照重叠种子方式划分测序片段,使用带有左右锚点的窗口扫描参考序列,找出种子中含有的剪接位点.计算实验表明,RNAMap精确度高达95%,召回率也明显优于其他算法.
RNA-seq has become the most effective method of analyzing alternative splicing events in different types of cells. The first step of processing data of RNA-seq is to exactly align millions of sequencing fragments against the reference sequence, which is called transcriptome sequence alignment. The existing sequence alignment tools for RNA-seq almost rely on canonical splice site signals, which, to some extent, limits the ability to identify novel splice sites. Therefore, we design a method independent from splice site signals, named RNAMap. It divides the sequencing fragments according to overlapping seeds method and scans the reference sequence via sliding windows with left and right anchors. In this way, splice sites within seeds can be identified. The computational experiments indicate that RNAMap not only reaches a precision of over 95%, but also outperforms the existing softwares in recall rate.