根据DNA测序片段数据的特点,提出了一个时间复杂度为0(nk2^2k2+mlogm+mk1)的单体型组装问题MEC/GI模型的参数化算法,其中m为片段数,n为单体型的SNP位点数,k1为一个片段覆盖的最大SNP位点数(通常小于10),k2为覆盖同一SNP位点的片段的最大数(通常不大于10)。对于实际DNA测序中的片段数据,即使m和n都相当大,该算法也可以在较短的时间得到MEC/GI模型的精确解,具有良好的可扩展性和较高的实用价值。
Based on the characters of DNA fragments, the paper introduces a parameterized algorithm for the minimum error correction with genotype information (MEC/GI) model of the haplotype assembly problem. Its time complexity is O( nk22^k2 + mlogm + mk1), where m is the number of fragments, n is the number of SNP sites, k1 is the maximum number of SNP sites that a fragment covers (usually less than 10), and k2 is the maximum number of the fragments covering a SNP site (usually not greater than 10). For the practical fragment data, the algorithm can solve the MEC/GI problem efficiently even if m and n are rather great, and it is scalable and applicable in practice.