位置:成果数据库 > 期刊 > 期刊详情页
基于支持向量机的细菌基因组水平转移基因预测
  • ISSN号:1000-3282
  • 期刊名称:《生物化学与生物物理进展》
  • 时间:0
  • 分类:S718.83[农业科学—林学]
  • 作者机构:[1]东南大学生物电子学国家重点实验室,南京210096
  • 相关基金:国家自然科学基金资助项目(60671018.60121101).
中文摘要:

随着各种生物基因组序列测定工作的完成,大量的DNA序列数据涌现出来,为研究在基因组中寻找水平转移基因提供了极大的便利.将基因序列特征分析和支持向量机技术结合起来,通过分析基因序列的特征差异发现水平转移基因.依据以前研究工作的基础,选取了绝对密码子使用频率(FCU)作为序列特征,主要因为它既包含了基因密码子使用偏性的信息,也包含了基因所编码蛋白的氨基酸组成信息,支持向量机利用这些信息进行水平转移基因分析和预测,可以提高预测的准确性.另外,提出了基于分链的水平转移基因预测新方法,即将细菌基因组前导链和滞后链上的基因区别对待,分别进行水平转移基因预测.结果显示,基本预测方法要优于目前预测结果最好的Tsirigos等提出的基于八联核苷酸频率的打分算法,命中率的相对提高率最高达31.47%,而基于分链的方法对水平转移基因的预测取得了更好的结果.

英文摘要:

Horizontal gene transfer (HGT), also Lateral gene transfer (LGT), is any process in which an organism transfers genetic material to another species that is not its offspring. With the increase of available genomic data, it has become more convenient to study the way to detect the genes, which are products of horizontal transfers among a given genome. There are few data about known horizontal gene transfers in three bacterium genomes under consideration, so the experiments, which simulated gene transfer by artificially inserting phage genes, were carried out. Combining the feature analysis methods of gene sequences with support vector machine (SVM), a novel method was developed for identifying horizontal gene transfers (HGT) in 3 fully sequenced bacterium genomes (Escherichia coli K12, Borrelia burgdooceri, Bacillus cereus ZK). According to our previous work, codon use frequency (FCU) was selected as the sequence feature, in respect that it is inherently the fusion of both codon usage bias and amino acid composition signals. In addition, another computational method was proposed considering strand asymmetry and predicting horizontal gene transfers of leading strand and lagging strand of genomes under consideration, respectively. To avoid the occasionality of simulating gene transfer through artificially inserting phage genes, 100 times of the transfer-and-recover experiment were repeated and arithmetic average of measurement for each genome being considered were reported to evaluate algorithm's performance. Ten-fold cross-validation was used for both parameter and accuracy estimation. The best results were obtained for C-Support Vector Classification (C-SVC) type by using the radial basis function kernel with γ=100, while for one-class SVM type the best performance was obtained using the polynomial kernel of three degree. The performance of the approach was compared with that of Tsirigos' method ,which is one of the best predictive approachs to date in detecting of horizontal trans

同期刊论文项目
期刊论文 110 会议论文 2 著作 4
同项目期刊论文
期刊信息
  • 《生物化学与生物物理进展》
  • 中国科技核心期刊
  • 主管单位:中国科学院
  • 主办单位:中国科学院生物物理研究所 中国生物物理学会
  • 主编:王大成
  • 地址:北京市朝阳区大屯路15号
  • 邮编:100101
  • 邮箱:prog@sun5.ibp.ac.cn
  • 电话:010-64888459
  • 国际标准刊号:ISSN:1000-3282
  • 国内统一刊号:ISSN:11-2161/Q
  • 邮发代号:2-816
  • 获奖情况:
  • 1999年中国期刊奖提名奖,2000年中国科学院优秀期刊特别奖
  • 国内外数据库收录:
  • 美国化学文摘(网络版),荷兰文摘与引文数据库,美国剑桥科学文摘,美国科学引文索引(扩展库),美国生物科学数据库,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊(2004版),中国北大核心期刊(2008版),中国北大核心期刊(2011版),中国北大核心期刊(2014版),中国北大核心期刊(2000版)
  • 被引量:18821