序列比对是生物信息学中一个重要的研究方向,它可以确定两个或多个序列之间的相似性,进而判断其同源性并推测出序列间的进化关系。目前,启发式序列比对算法BLAST算法在实际问题用着重要应用。该算法中有一个参数叫做种子(Seeds),种子是控制比对速度和灵敏度的关键。但是种子的长度是基于经验而取的一个固定值,这个经验值并不适合于所有长度序列比对问题。因此,对于两条不用长度的序列之间实现启发式比对就需要取合理长度的种子,以便实现高效快速的比对。文中应用概率随机的思想对不同长度序列比对的种子的长度进行了分析,在此基础上对一定长度下种子的比对灵敏度做出了计算。通过理论推导和实验分析一定灵敏度下种子长度的计算结果是可行且有效的。这就给在高灵敏度(灵敏度几乎等于动态规划算法)下实现快速启发式序列比对的优化提供了保证。
equence alignment is an important direction in bioinformatics, which is to determine the similarity for two or more Sequences and thus determines the homology and evolutionary relationship between the deduced sequence. At present, the heuristic sequence align- ment algorithm has important applications in the practical problems. This algorithm has a parameter called seeds, which is to control the speed and sensitivity of the algorithm. But the length of the seed is based on the experience and takes a fixed value. Obviously, this value is not suitable for the actual problem of all length sequence. It studies the length of the seed by the probability theory, and, on this basis, studies the sensitivity of the algorithm. The heuristic sequence alignment algorithm can be used nearly as the same sensitivity of dynamic programming algorithm in theoretical analysis and experimental analysis. In this paper,the probability theory is used for the analysis of the length of the seed. The analysis of seed's length under a certain sensitivity is feasible and effective in theoretical analysis and experimen- tal. This will ensure the sequence alignment algorithm with the high speed and high sensitivity, which can be used nearly as the same sen- sitivity of dynamic programming.