东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

DNA序列中基于适应性后缀树的重复体识别算法

ISSN号：0254-4164
期刊名称：《计算机学报》
时间：0
分类：TP18[自动化与计算机技术—控制科学与工程;自动化与计算机技术—控制理论与控制工程]
作者机构：[1]西安电子科技大学计算机学院,西安710071
相关基金：国家自然科学基金（69601003）; 青年科学基金（60705004）资助

关键词：重复体识别, 适应性后缀树, Ukkonen算法, RepSeeker算法, repeats identification, adaptive suffix tree, Ukkonen algorithm, RepSeeker algorithm

中文摘要：

现有的在DNA序列中识别重复体的算法多数是基于比对的,对识别速度和吞吐量有很大的限制.针对这个问题文中根据一个平衡重复体的长度和频率的定义,提出了一种基于Ukkonen后缀树的快速识别重复体的RepSeeker算法.算法采用最低限制频率,最大程度地扩展了重复体的长度,同时为了进一步地提高RepSeeker算法的效率,对Ukkonen的后缀树构造算法进行了适应性改进,在构造时加入RepSeeker算法所需的结点信息并将叶子结点和分支结点加以区分,从而使得RepSeeker算法能通过直接读取结点信息来求得子串频率和子串位置.这种改进较大地提高了RepSeeker算法的性能,而且空间开销不大.实验中使用了NCBI中的9条典型DNA序列作为测试数据,并对后缀树改进前后的重复体识别算法做了比较分析.结果表明,RepSeeker在没有损失精度的情况下缩短了算法的运行时间.实验结果与理论上的分析一致.

英文摘要：

Many existing methods for repeats identification are based on alignments.Their speed and time significantly limit their applications.This paper presents the fast Rep（eats）Seeker algorithm for repeats identification based on the adaptive Ukkonen suffix tree construction algorithm.The RepSeeker algorithm uses the lowest frequency limit to maximize the extension of repeats.The adaptive improvements to the Ukkonen algorithm are made to increase the efficiency of the RepSeeker algorithm.The node information required by the RepSeeker algorithm is added during the suffix tree construction.Because information on leaves and branch nodes are different,the RepSeeker algorithm directly obtains the needed information from the nodes to find out the frequency and locate the positions of a substring.The improvement is considerable for the repeats identification at a little extra cost in space.Nine sequences from the National Center for Biotechnology Information （NCBI） are used to test the performance of the RepSeeker algorithm.Comparisons between before and after improvements of the suffix tree construction show that the running time of the RepSeeker algorithm is reduced without losing the accuracy.The experimental results agree with the theoretical expectations.

同期刊论文项目

有约束多项分布转录因子结合位点识别

期刊论文 35 会议论文 5

基于遗传算法的神经网络最优结构设计

期刊论文 5

同项目期刊论文

(l，d)-模体识别问题的遗传优化算法

Progressive transductive learing patter classification via single sphere

A Greedy Two-stage Gibbs Sampling Method for Motif Discovery in Biological Sequences

直接支持向量回归机

基于壳向量和中心向量的支持向量机

Non-coding background sequences modeling based on Bayesian hypothesis testing

在原始空间用Rosenbrock算法训练线性支持向量机

Fast Training of SVDD by Extracting Boundary Targets

空间支持向量域分类器

Hooke and Jeeves algorithm for linear support vector machine

免比例因子F的差分进化算法

Detection of over-represented motifs corresponding to known TFBSs via motif clustering and matching

Moitf GibbsGA: Sampling transcription factor binding sites coupled with PSFM optimization by genetic

生物序列模体的混合Gibbs抽样识别算法

基于全条件独立的贝叶斯网络MPE-JT构造算法

改进的渐进直推式支持向量机算法

最小二乘支持向量机变型算法研究

用于回归的临近支持向量机

基于矩估计的生物序列模体的贝叶斯检验

基于极大似然准则的生物序列模体的贝叶斯假设检验

基于可信度的渐进直推式支持向量机算法

服务质量路由问题的一个新进化算法

基于马氏椭球学习机的监督野点探测

构建本质图的改进算法

基于近似支持向量回归机的多属性决策

冲击地压危险等级预测的PSO-SVM模型

广义支持向量机的多项式光滑函数法

一类分类马氏椭球学习机的改进

一组提高存储效率的深度包检测算法

Progressive transductive learning pattern classification via single sphere

(l，d)-模体识别问题的遗传优化算法

一组提高存储效率的深度包检测算法

一种具有精确边界的重复体识别算法

基于最大权值路径算法的DNA多序列比对方法

期刊信息

《计算机学报》
北大核心期刊（2011版）

主管单位:中国科学院
主办单位:中国计算机学会中国科学院计算技术研究所
主编：孙凝晖
地址：北京中关村科学院南路6号
邮编：100190
邮箱：cjc@ict.ac.cn
电话：010-62620695

国际标准刊号：ISSN：0254-4164
国内统一刊号：ISSN：11-1826/TP
邮发代号:2-833

获奖情况:
中国期刊方阵“双效”期刊

国内外数据库收录:
美国数学评论（网络版）,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘,日本日本科学技术振兴机构数据库,中国中国科技核心期刊,中国北大核心期刊（2004版）,中国北大核心期刊（2008版）,中国北大核心期刊（2011版）,中国北大核心期刊（2014版）,中国北大核心期刊（2000版）

被引量:48433