DNA序列中保守序列的识别需要较大的计算量。开发了一个转录因子结合位点识别的并行算法,能够从多条DNA序列中识别指定长度的序列模式。算法使用概率模型进行序列模式保守性的度量,利用迭代过程实现保守序列的搜索。使用C编程结合MPI消息传递模型开发了相应的程序,并在Windows平台下构建了一个3节点的集群环境,利用20个长度均为200的序列数据集进行测试,实现了模体识别工作,结果表明并行算法使模体识别的效率得到提高。
It needs more computation time to recognize conservative DNA sequences. Therefore, a parallel algorithm of transcription factor binding sites (TFBS) recognition was developed, which can discover a sequence pattem of given length from a group of DNA sequences. This algorithm is based on probability model and is achieved by iteratively searching. A corresponding program was implemented based on C language and MPI message transfer model, a three - node computer cluster was constructed successfully on the Windows platform, and a multi - sequence simulated dataset was tested, which consists of 20 sequences and each of which is 200 bases long. The successful experiment result indicated the efficiency of pattern recognition was improved greatly with parallel algorithm.