位置:成果数据库 > 期刊 > 期刊详情页
生物序列模体的混合Gibbs抽样识别算法
  • 期刊名称:电子学报, 2008, 36 (04): 750-755. (EI: 082411312846)
  • 时间:0
  • 分类:Q811.4[生物学—生物工程] TP301.6[自动化与计算机技术—计算机系统结构;自动化与计算机技术—计算机科学与技术]
  • 作者机构:[1]西安电子科技大学计算机学院,陕西西安710071
  • 相关基金:国家自然科学基金(No.60705004);陕西省自然科学基金(No.2005F33)
  • 相关项目:有约束多项分布转录因子结合位点识别
中文摘要:

针对生物序列模体的识别问题,提出了一个新的混合Gibbs抽样识别算法.算法基于混合模体模型学习,采用贪心策略,通过似然度最大化,逐次将新的模体加入到混合模型中.算法中设计了位点抽样和模体抽样两种抽样方法,这两种抽样方法交替进行.为了加速搜索过程,对输入数据集采用了基于kd—trees的分层划分策略.实验结果表明,该算法对序列家族大量模体特征的识别具有显著优势,并且可建立更具统计特征的模体模型,从而提高序列分类的准确性.

英文摘要:

For the motif discovery problem of biological sequences, a mixture Gibbs sampling algorithm is presented. Based on mixture motifs model learning through likelihood maximization, a greedy strategy that adds sequentially new motif to a mixture model is employed, Two sampling methods are designed, site sampling and motif sampling, the two sampling methods are applied by turns. In order to speed up the searching procedure, a hierarchical partitioning scheme based on kd-trees is used for partitioning the input dataset. Experimental results indicate that the proposed algorithm is adyantageous in identifying larger groups of motifs characteristic of biological families. In addition, it offers better diagnostic capabilities by building more powerful statistical motif models with improved classification accuracy.

同期刊论文项目
同项目期刊论文