特殊密码子的鉴别是密码子用法研究中的重要课题,对基因工程领域中密码子优化的实验设计起到关键性作用。高频密码子提供了其中1种方法,但其界定标准仍存有问题。本文研发的高通量高频密码子软件HighCodon,解决了在海量序列数据中检验该标准的适用性问题的瓶颈,为相关研究提供有力的工具支持。该软件主要包括3个功能模块,即输入解析模块、密码子用法表生成模块和高频密码子鉴别模块,并具备3个显著特点:(1)多数据源性,可接受3种类型的输入,即本地FASTA格式的序列文件、本地密码子用法表(CUT)格式的CUT文件和远程密码子用法数据库(CUD)的CUT地址,兼顾本地和远程两种数据来源;(2)高灵活性,支持3种输入的混合形式;(3)高通量性,多条输入记录的批量处理。该软件与重要在线服务CUD具有良好的整合,可方便获取高达35799个物种的CUT,同时进行高频密码子分析。此外,为了便于CUT数据存储和交换,本文参照FASTA格式,提出1种CUT格式。
The identification of special codons is one of important issues for the codon usage research.It plays a key role in the experiment design of the codon optimization in genetic engineering.High-frequency codons(HFC) are a kind of these codons.However,there are some problems in the existed identification standard of HFC.In this paper,the high-throughput high-frequency codons software named HighCodon was developed as a powerful tool,which solved the bottle-neck to test the applicability of the standard in mass sequence data.The software was mainly consist of three modules,that is,input analyzing module,codon usage table generating module and high-frequency codon identifying module.HighCodon had three remarkable features,(i) multi-data sources that included the local and the remote,local FASTA format sequence files,local codon usage table(CUT) format CUT files and remote CUT address in codon usage database(CUD) were acceptable;(ii) high-flexibility,the mixed input of above three sources was supportable;(iii) high-throughput,batch processing was applied for dealing multi-input records.The tool was integrated well with the important online server CUD.That means gaining up to 35799 species' CUTs meanwhile analyzing HFC is rather convenient.Besides,the paper proposed a kind of CUT format based on FASTA format,in order to achieve the storage and exchange of CUT data.