非比对序列分析是最新发展的一种序列分析方法,具有计算效率高并适用于分析低相似性的序列,已成功用于DNA的序列分析中.但是由于蛋白质序列的复杂性,非比对序列分析对于蛋白质序列分析的准确度却不高.用将20种天然氨基酸残基归类的方法,简化了蛋白质序列的复杂性,并运用到对蛋白质的非比对序列分析中,有效地提高了序列分析的准确性.
Alignment-free comparison is a recently developed method for sequence alignment, which has high computational efficiency and suitable to the low identical sequences. Alignment-free comparison was successfully applied in the DNA analysis. However, the accuracy of analysis is not high when it was applied in protein analysis because the complexity of protein is larger than DNA by consisting of 20 types of residues. Thus, residues are clustered into a few groups based on their similarity of physicochemical features. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. Therefore, the accuracy of alignment-free comparison is improved.