用自洽聚类方法判定大肠杆菌蛋白质编码基因SD序列强弱,给出构成强SD序列的17种碱基关联模式.将全部SD序列按作用强弱不同分为三类:强、中、弱,发现强弱不同时最偏好模式不同,如GGAGG是弱SD序列的最偏好模式,AAGGA是强SD序列的最偏好模式.同一模式距起始密码子的距离不同时,所起的调控作用也不同,如GGAG模式中的A在强SD序列中位于-8位点,在弱SD序列中位于-7和-9位点.平均来说,各SD序列的-9位点上碱基G出现的概率最大.结果还表明SD序列越强,基因的表达水平越高,SD序列越弱,基因表达水平越低.SD序列与anti-SD序列的配对程度和相对位置影响起始密码子的识别和翻译效率.
The self-consistent information clustering is applied to predict the strength of the SD sequence of the protein-coding genes in E. coli and 17 kinds of base correlation modes which are contained in strong SD sequence are shown. All SD sequences are classified into three classes by different strengths,such as strong,medium and weak SD sequences. It is found that different SD classes are characterized by different preferential modes. For example ,the most preferential mode in the weak SD sequences is GGAGG,whereas it is AAGGA in the strong SD sequences. The distances between the base in particular modes and translation initiation codons are various among different SD classes. For example,the base A in GGAG mode occurs mostly at position --8 in strong SD sequences, however,it occurs more often at position -7 or --9 in weak SD sequences. On average,the base which most frequently occurs at position --9 is base G. The results also show that the stronger SD sequences correlate the higher gene expression level and the more weak SD sequences correlate the lower gene expression level. The recognition of translation initiation codon and translation efficiency are affected by the .matching between SD and anti-SD sequences and the relative position of SD sequence.