Sanger测序法测序目的基因常包含有目的基因和载体序列,为了快速去除测序目的基因载体序列,提出了一种新的目的基因载体序列去除方法并开发了程序Vector cleaner。首先利用该程序批量读取引物信息和目的基因测序序列;其次,程序在所读取的引物序列上建立引物半长的滑动窗口来产生种子,通过计数种子与测序序列的匹配次数,定位引物位置和删除引物两侧的载体序列;最后,程序通过比较上游引物序列和其反向互补序列分别与测序序列匹配种子数,判断和转换正义链。使用Vector cleaner对12条GhVIN1基因测序序列进行去载体测试,并与Seqclean和SeqMan软件相比较。结果表明:Vector cleaner能有效去除棉花GhVIN1基因测序载体序列,识别并翻译反义链序列。与Seqclean和SeqMan软件相比较,Vector cleaner正确率高,敏感性强。Vector cleaner、SeqMan和Seqclean所测试序列的总序列数正确率分别为100%、100%和91.6%,总碱基正确率分别为99.90%、99.00%和94.33%。与同类软件比较,Vector cleaner更适合实验人员批量去除测序目的基因载体序列,具有准确率高、敏感性强、自动翻译反义链的特点。
Sequenced target genes produced by automated Sanger sequencing machines frequently contain fragments of the vector sequences.Hence,to remove vector sequence in sequenced target gene and translate the antisense strand sequence,a novel method was proposed and a small software,Vector cleaner,was developed using Perl language.The key feature of Vector cleaner is that it can remove vector sequences in batch processes and translate the antisense strand sequence to sense strand sequence. Vector cleaner,works in three steps.First,Vector cleaner reads primers information and target gene sequencing information from input files.Second,a sliding window of half length of primers at every base was set in primers to generate seeds.The seeds are used to scan the target gene sequence to find the perfect matching.In this phase,Vector cleaner could identify the primer and remove vector sequences flanking the primers.Third,Vector cleaner detects the sense strand sequence by comparing the seeds matching times in slide window of the upstream primer and its reverse complement sequences.In this study,the proposed method was compared to softwares,SeqMan and Seqclean with similar function,using 12 sequencing results of the cotton gene GhVIN1.12 sequences were amplified from Gossypium arboreum cv.JLZM and Gossypium raimondii.The cDNA fragments were cloned into the pMD19-T vector and sequenced.Seqclean is a software based on NCBI's UniVec database and run in default parameters to screen vector.SeqMan imported plasmids pMD19-T sequences and run in default parameters.The results of Vector cleaner,SeqMan and Seqclean were analysed using multiple sequence alignment software Clustal X.The results showed that Vector cleaner successfully removed the vector sequences of cotton gene GhVIN1 and exported the detail results including primer information,product size and target gene sequence to an excel file.Sequences of GhVIN1-1,GhVIN1-2,GhVIN1-3,GhVIN1-4,GhVIN1-7,GhVIN1-8,GhVIN1-10,GhVIN1-12 were detected to be antisense strand sequences and automati