【目的】中华按蚊Anopheles sinensis是我国及东南亚重要的传疟媒介。本研究在全基因组上鉴定和分析中华按蚊微卫星并注释微卫星相关基因的功能,为遗传分子标记的筛选提供依据,也为昆虫微卫星比较基因组学进一步研究提供基础。【方法】用MISA程序鉴定中华按蚊基因组微卫星;用Excel 2010统计微卫星长度,结合微卫星序列信息编写Perl脚本计算微卫星碱基含量;结合微卫星位置信息编写Perl脚本定位微卫星出现的基因区域,并对基因区的微卫星进行GO功能注释;运用WEGO比较中华按蚊和冈比亚按蚊An.gambiae含微卫星相关基因功能注释。【结果】共鉴定出105 981个微卫星,出现的密度是365.5个/Mb。其中100 391个(94.7%)微卫星是完整型微卫星,其余5 590个(5.3%)是复合型微卫星。单碱基微卫星最为丰富,共58 837个,占总微卫星数量的55.5%,其余依次是二碱基(30 345个,占28.6%)、三碱基(15 104个,占14.3%)、四碱基(1 530个,占1.4%)、五碱基(121个,占0.1%)和六碱基(44个,少于0.1%)微卫星。(A)n为最主要的微卫星,其次是(AC)n,(AG)n,(C)n,(AGC)n,(ATC)n,(ACG)n和(ACC)n,数量都在2 000个以上。中华按蚊基因组微卫星长度以10~20 bp为主(87.1%)。这些微卫星的AT含量(63%)明显高于GC含量(37%),仅三碱基微卫星的GC含量(53%)略高于AT含量(47%)。90 632个微卫星(85%)分布在基因间区,15 349个(15%)微卫星分布在基因区。在基因区,2 782个(3%)微卫星分布在外显子区,12 567个(12%)分布在内含子区。GO注释比较中华按蚊和冈比亚按蚊含微卫星的基因,发现这两个物种各小类基因所占总基因数的百分比基本一致,但电子传递类(electron carrier)基因在中华按蚊所占百分比(0.9%)明显高于冈比亚按蚊(0.1%)。【结论】这是蚊虫中首个在全基因组上系统的微卫星研究工作,为
【Aim】Anopheles sinensis is an important malaria vector in China and southeastern Asia. The study aims to identify and analyze the simple sequence repeats( SSRs,also called as microsatellites) and to annotate the functions of SSR-containing genes in the whole genome of An. sinensis,so as to provide the basis for the selection of molecular genetic markers in An. sinensis and to lay a foundation for further studies of the comparative genomics of SSRs in insects. 【Methods】MISA program was used to identify SSRs in the An. sinensis genome,and Excel 2010 was used to count the length of SSRs identified. Perl scripts were written in the study to calculate the SSRs base content based on the SSR sequence and to map the SSRs to the genome based on the SSR location information from SSR identification. WEGO was used to carry on the GO function annotation of SSR-containing genes in An. sinensis and An. gambiae.【Results】A total of 105 981 SSRs were identified in the An. sinensis genome,with the genomic density of 365. 5 SSRs per Mb. Out of these SSRs,100 391( occupying 94. 7%) are perfect SSRs,and the remaining 5 590( 5. 3%) are compound SSRs. The mononucleotide SSRs( 58 837,55. 5%) are the most abundant,followed by dinucleotide SSRs( 30 345,28. 6%),trinucleotide SSRs( 15 104,14. 3%),tetranucleotide SSRs( 1 530,1. 4%),penanucleotide SSRs( 121,0. 1%) and hexanucletide SSRs( 44,less than 0. 1%). The( A) n SSRs are the most predominant,followed by( AC) n,( AG) n,( C) n,( AGC) n,( ATC) n,( ACG) n and( ACC) n,and each of these types has more than 2 000 SSRs. The SSRs of 10- 20 bp length occupy 87. 1% of the total. Except that the GC-content( 53%) of trinucleotide SSRs is slightly higher than their AT-content,the AT-content( 63%) of other SSRs is obviously higher than the GC-content( 37%). And 90 632( 85%) SSRs are located in the intergenic region,and 15 349( 15%) SSRs in the gene region,of which 2 782( 3%) SSRs are in the exon regi