针对生物序列分析中的多序列比对问题,当输入数据量比较大时,人们提出了很多启发式的算法来改善计算速度和比对结果.提出了用于进行全局DNA多序列比对的一种方:MWPAlign(maximum weighted pathalignment).该算法把序列信息用de Bruijn图的形式表示,并将输入序列的信息记录在图的边上,这样,就将求调和序列的问题转化为求图的最大权值路径问题,使多序列比对问题的时间复杂度降低到几乎线性.实验结果显示:MWPAlign是可行的多序列比对算法,尤其对于变异率低于5.2%的大量序列数据,相对于CLUSTALW(cluster alignments weight),T-Coffee和HMMT(hidden Markov model training)有较好的比对结果和运算性能.
For multiple sequences alignment problem in molecular biological sequence analysis, when the input sequence number is very large, many heuristic algorithms have been proposed to improve the computation speed and the quality of alignment. An approach called MWPAlign (maximum weighted path alignment) is presented to do global multiple alignment for DNA sequences. In this method, a de Bruijn graph is used to express the input sequences information, which is recorded in the edges of the graph. As a result, a consensus-finding problem can be transformed to a maximum weighted path problem of the graph. MWPAlign obtains almost linear computation speed of the multiple sequences alignment problem. Experimental results show that the proposed algorithm is feasible, and for a large number of sequences with mutation rate lower than 5.2%, MWPALign can obtain better alignment results and has lower computational time as compared to CLUSTALW (cluster alignments weight), T-Coffee and HMMT (hidden Markov model training).