在现有DNA序列数据压缩算法的基础上,以DNA序列数据的存储效率及生物学解释综合考虑,设计并实现了基于字典的DNA序列压缩算法DNADCompress。算法核心包括重复子串字典建立、字典项筛选、字串压缩编码三方面。实验数据表明,数据压缩算法压缩效果达到常用DNA序列压缩算法水平,并为序列生物学解释提供了基础。
With the existing DNA compression algorithms based on dictionary, a new algorithm called DNADCompress based on dictionary was proposed and it achieved a balance between the store space and biology knowledge. The core of this algorithm includes the building dictionary of repeat strings, the selecting of dictionary items and the encoding of compression strings. The advantages of this algorithm were demonstrated by its good compression ratio and its explanation to the DNA sequences in bioinformatics.