生物信息学中,发现、鉴别新基因是承上启下的一步,它既承接了过往如“基因组测序”的工作,又是未来“后基因时代”研究的基石.“基因电脑克隆”是利用计算手段发现、鉴别新基因的方法,SiClone软件实现了“基因电脑克隆”功能.本文对SiClone软件操作的数据库提出并行处理方案,并详述了基于MPI(message passing interface)平台实现的并行优化版本PSiClone.根据已得到的EST数据库,展示了软件并行版PSiClone的运行性能,试验数据库EST序列条数仅仅是NCBI(The National Center for Biotechnology Information)dbEST庞大数据库的很小部分,这也暗示我们软件的并行工作对于大数据库的比较和运算祷更有应用前景.
In Bioinformatics, it is a consecutive step for finding and identifying new genes, which keeps the genome sequencing work and is the unique basis for post-genome period to analyze gene function. In this paper, the method silico gene cloning of SiClone software is introduced for dealing with identifying new genes. The parallel programming scheme is proposed for SiClone to use EST database. An optimized parallel MPI version of SiClone, PSiClone, is implemented in detail. The performance of PSiClone is measured by a given EST database, which is a small part of NCBI dbEST. The performance shows that PSiClone will be suitable to comparison and manipulation for large database.