针对传统方法在分析DNA序列相似性方面的不足,提出了一种新的基于信息量的DNA序列相似性分析算法,该方法将DNA序列视为基于符号集{A,c,G,T}的信号序列,全部待比较的DNA序列组合成一个以字符A、C、G、T为属性值的信息系统。在所得数据库系统中引进DNA序列的信息量、联合信息量、条件信息量、交互信息量等概念,讨论这些信息量的性质并给出它们之间的一些关系式,然后在此基础上构建DNA序列相似性分析模型。仿真实验结果表明,该方法不但能快速、有效地分析DNA序列相似性,而且较好地克服了DNA碱基数量很大且不同物种的DNA序列长短不同的不足。
Aiming at lacking in similarity analysis of DNA sequences using traditional methods, this paper proposed a novel similarity analysis of DNA sequences based on information quantity, and a DNA sequence was viewed as a signal sequence based on symbol set { A, C, G, T t , and then the DNA sequences could be viewed as a information system with attribute value A, C, G,T. It recommended the concepts of information quantity, joint information quantity, condition information quantity, mutual information quantity of DNA sequences in the database system, and discussed the properties about them, and then pro- vided some relation formulas, then built DNA sequences similarity analysis model based on this. The simulation results show that the method not only can effectively analysis of similarity of DNA sequences, but also overcome shortages for a large num- ber of DNA and DNA sequences of different species with different length.