单核苷酸多态性(SNP)单体型装配问题就是从给定的来自某人染色体的SNP片段中去除错误,重构出醛可能与原来片段一致的单体型.这个问题有几个不同的模型:最少片段去除(MFR)问题,最少SNP去除(MSR)问题以及最少错误纠正(MEC)问题.前两个问题的复杂性与算法已有一些学者研究过.第三个问题已被证明是NP完全问题,但这个问题的实际算法还没有,该文对MEC问题给出了一个分支定界算法,这个算法能得到问题的全局最优解.通过这个算法对实际数据的计算说明了MEC模型的合理性,即在一定条件下,通过修正最少的错误重构出的单体型确实是真实的单体型.由于分支定界算法对这样一个NP完全问题不能在叮接受的时间内解规模较大的问题,文中又给出了求解MEC问题的两个基于动态聚类的算法,以便对规模较大的问题在可接受的时间内得到近似最优解.数值实际表明这两个算法很快,很有效.这两个算法总能得到与分支定界找到的全局最优解很接近的近似最优解.鉴于MEC问题是NP完全的,这两个算法是有效的、实际的算法.
The SNP haplotype assembly problem is to reconstruct a maximally consistent pair of haplotypes by removing errors from a set of SNP fragments of one individual's DNA. There are several models for the problem: Minimum Fragment Removal(MFR)problem,Minimum SNP Removal(MSR) problem, and Minimum Error Correction (MEC)problem. Complexity and practical algorithms for the first two problems have been studied by several authors. The third problem has been proved to be NP-complete,but there are few practical algorithms for it. In this paper,a branch and bound algorithm to solve the MEC problem for globally optimal solutions is given. And this algorithm is used to illustrate biological rationality of MEC model according to computational results based on real data,i, e. ,under certain conditions the haplotypes generated through minimum number of error correction are indeed the real haplotypes. Since the branch and bound algorithm can not solve the large size problems within acceptable time, two algorithms based on dynamic clustering for MEC problem are given to obtain approximately-optimal solutions. Numerical experiments show that the algorithms are very last and have so good performance that they can find good solutions. Since MEC is an NP-hard problem,these algorithms are effective and practical.