为了获得高效的拼接结果,针对新测序技术产生的较短测序片段,提出了通过对测序片段编码,将其映射到能够快速查找的自定义表中,结合高效位并行字符串模糊匹配算法———BPM,从自定义表中寻找较长连通路径的方法,实现了对短测序片段的快速拼接。实验结果表明,该算法针对500M的高质量源数据,在耗时136s的情况下,准确度可达79%,覆盖度可达82%;针对错误率为0.1%的500M源数据,在耗时150s的情况下,准确度可达72%,覆盖度可达73%。在短时间内较好的完成了拼接任务。
The sequences(also called read) generated by new technologies are very short.For these short reads,in order to get the high effective sequencing results,after be encoded,they are mapped into a customized table,then an effective bit parallel fuzzy string match algorithm is employed,that is,BPM.Finally a long connected path in the customized table is found to achieve the rapid genomic sequence assembly.The experimental results show that,under the condition of running the algorithm on the 500M high quality source data,the time consumes 136s,the accuracy rate achieves 79%,and the coverage rate achieves 82%.For 500M source data contains 0.1% error rate,the time consumes 150s,the accuracy rate achieves 72%,and the coverage rate achieves 73%.The task is well done in a short time.