位置:成果数据库 > 期刊 > 期刊详情页
Identification of MicroRNA Precursors with Support Vector Machine and String Kernel
  • 期刊名称:Genomics, Proteomics & Bioinformatics
  • 时间:0
  • 页码:121-128
  • 语言:中文
  • 分类:Q81[生物学—生物工程]
  • 作者机构:[1]Department of Computer Science, Nanjing Normal University, Nanjing 210097, China, [2]Department ofEntomology, Nanjing Agricultural University, Nanjing 210095, China
  • 相关基金:Acknowledgements This work was partly supported by the National Natural Science Foundation of China (No. 60405001 and 60875001) and the Natural Science Foundation of Jiangsu Province, China (No. BK2004142).
  • 相关项目:基于核、正则化与多目标优化技术的多标签分类算法及其应用研究
中文摘要:

MicroRNAs (miRNAs ) 是一个家庭(2123 nt ) 突然,规章的非编码的 RNA 处理了从长(70110 nt ) miRNA 先锋(pre-miRNAs ) 。识别真、假的先锋在 miRNAs 的计算鉴定起一个重要作用。一些数字特征从先锋序列和他们的第二等的结构被提取了适合一些分类方法;然而,他们可以失去在序列和结构隐藏的一些有用地歧视的信息。在这研究, pre-miRNA 序列和他们的第二等的结构直接被用来基于在二个序列之间的加权的 Levenshtein 距离构造一个指数的核。这个字符串内核然后为检测真、假的 pre-miRNAs 与支持向量机器(SVM ) 被相结合。在 331 上基于训练真、假的人的 pre-miRNAs 的样品,在 SVM 的 2 个关键参数被 5 褶层选择有不同 5 褶层分区的十字确认和格子搜索,和 5 条认识被执行。在 16 独立人士之中,测试从 3 人, 8 动物, 2 工厂, 1 个病毒,和 2 人工地假的人设定 pre-miRNAs,我们的方法统计上在 11 个集合上超过以前的基于 SVM 的技术包括 3 人, 7 动物,和 1 假人的 pre-miRNAs。特别地,有通常在以前的工作被排除的多重环的 premiRNAs 正确地与 92.66% 的精确性在这研究被识别。

英文摘要:

MicroRNAs (miRNAs) are one family of short (21-23 nt) regulatory non-coding RNAs processed from long (70-110 nt) miRNA precursors (pre-miRNAs). Identifying true and false precursors plays an important role in computational identification of miRNAs. Some numerical features have been extracted from precursor sequences and their secondary structures to suit some classification methods; however, they may lose some usefully discriminative information hidden in sequences and structures. In this study, pre-miRNA sequences and their secondary structures are directly used to construct an exponential kernel based on weighted Levenshtein distance between two sequences. This string kernel is then combined with support vector machine (SVM) for detecting true and false pre-miRNAs. Based on 331 training samples of true and false human pre-miRNAs, 2 key parameters in SVM are selected by 5-fold cross validation and grid search, and 5 realizations with different 5-fold partitions are executed. Among 16 independent test sets from 3 human, 8 animal, 2 plant, 1 virus, and 2 artificially false human pre-miRNAs, our method statistically outperforms the previous SVM-based technique on 11 sets, including 3 human, 7 animal, and 1 false human pre-miRNAs. In particular, premiRNAs with multiple loops that were usually excluded in the previous work are correctly identified in this study with an accuracy of 92.66%.

同期刊论文项目
同项目期刊论文