【目的】鉴定中华按蚊Anopheles sinensis基因组上的CPF家族表皮蛋白基因,分析其基因结构和特征,推测其可能的生物学功能;同时比较研究代表性蚊种的CPF家族基因,提供CPF家族基因的信息框架。【方法】基于中华按蚊An.sinensis、冈比亚按蚊An.gambiae、微小按蚊An.minimus、埃及伊蚊Aedes aegypti、致倦库蚊Culex quinquefasciatus和黑腹果蝇Drosophila melanogaster全基因组序列,以冈比亚按蚊CPF家族基因序列为询问序列,采用BLASTP,TBLASTN和HMM方法鉴定这些物种的CPF家族基因;利用生物信息学方法预测中华按蚊CPF家族基因的结构、剪切模式、信号肽、跨膜区、结构域和3D结构等;采用最大似然法(maximum likelihood,ML)构建这些物种的系统发生关系,推断CPF家族基因的起源和进化。【结果】中华按蚊、冈比亚按蚊、微小按蚊、埃及伊蚊、致倦库蚊和黑腹果蝇全基因组共有4,4,4,3,3和3个CPF家族基因。中华按蚊的CPF基因被分别命名为As CPF1,As CPF2,As CPF3和As CPF4,这些As CPF基因的全长c DNA序列分别为736,2 021,531和1 001 bp,分别编码219,345,148和185个氨基酸。As CPF1,As CPF2和As CPF3仅含有一个内含子,但As CPF4含有3个内含子,所有内含子均为0位内含子。As CPF1,As CPF2,As CPF3和As CPF4分别有3,2,1和2个不同的选择性剪切子。As CPF3的表达量最高,其次是As CPF4,As CPF2和As CPF1。推测的As CPF1,As CPF2,As CPF3和As CPF4的理论分子量分别为22.86,36.47,15.08和18.66 k D,等电点分别为9.08,8.97,9.44和9.16。As CPF家族蛋白含有保守的44个氨基酸基序和C-末端基序;As CPF1,As CPF3和As CPF4具有信号肽,为分泌型蛋白,而As CPF2缺乏信号肽,为非分泌蛋白。二级结构分析显示,4个As CPF均具有α-螺旋,无规卷曲和延伸链,只有As CPF4有一段跨膜片段,位于第5-27位氨基酸。系统发育分析显示,CPF3基因可能是最早分化出来的CPF家族基因,CPF1和CPF2基因可能是同一祖先基因
【Aim】 This study aims to identify the CPF family( CPFs) of cuticular protein genes in Anopheles sinensis genome, to analyze their structure and characteristics, to deduce their possible biological functions,and to investigate and compare the CPFs of representative mosquito species so as to provide information frame for the family of genes. 【Methods】We identified the CPFs in the genomes of An.sinensis,An. gambiae,An. minimus,Aedes aegypti,Culex quinquefasciatus and Drosophila melanogaster using BLASTP,TBLASTN and HMM with An. gambiae CPFs as query,predicted the structure and splicing variation of An. sinensis CPF gene and the signal peptide,transmembrane region,structural domain and3 D structure of An. sinensis CPF proteins using bioinformatics techniques,and constructed phylogenetic relationships using maximum likelihood( ML) method and deduced the origin and evolution of CPFs in these species. 【Results】There are 4,4,4,3,3 and 3 CPFs in An. sinensis,An. gambiae,An.minimus,Ae. aegypti,Cx. quinquefasciatus and Dr. melanogaster genomes,respectively. The CPFs in An. sinensis were named as As CPF1,As CPF2,As CPF3 and As CPF4,respectively. Their full-length c DNA sequences are 736,2 021,531,and 1 001 bp,respectively,encoding 219,345,148 and 185 amino acids,respectively. As CPF1,As CPF2 and As CPF3 only have one intron,but As CPF4 contains three introns,which all have phase "0". There are 3,2,1 and 2 selective spicing variants for As CPF1,As CPF2,As CPF3 and As CPF4,respectively. As CPF3 has the highest expression quantity,followed by As CPF4,As CPF2 and As CPF1. The theoretical molecular weights of As CPF1,As CPF2,As CPF3 and As CPF4 are 22. 86,36. 47,15. 08 and 18. 66 k D,and their isoelectric points are 9. 08,8. 97,9. 44 and 9. 16,respectively. These As CPFs contain a 44-amino-acid conserved region and C-terminal region,and all are secretory proteins with signal peptide sequences except for As CPF2 that is non-secretory protein and lacks a signal peptide sequence. All the four As CPFs have alp