MicroRNAs(miRNAs)是动植物中较短的参与调控基因表达的功能性非编码RNA序列.第一个miRNA是通过实验手段发现的,然而通过实验手段识别miRNA在技术上仍然具有很大的挑战性和不完整性.因此,miRNA基因识别需要寻求计算方法来弥补实验方法的不足.提出了一个全新的miRNA前体的识别方法.在构造识别模型中,把初级序列和序列二级结构相结合,采用κ-gram方法把序列信息映射到高维特征空间中,然后通过特征选取方法提取特征,并用这些特征为miRNA前体的识别构造了基于SVM的识别模型.同时,采用隐马尔可夫模型(HMM)的学习方法进行了比较.实验结果表明,该方法是有效的,可以达到较高的敏感性和特异性.
MicroRNAs(miRNAs) are short non-coding RNAs that play important regulatory roles in both animals and plants. While the first miRNAs were discovered using experimental methods, experimental miRNA identification remains technically challenging and incomplete. Hence, computational approaches are a natural choice to complement experimental approaches to miRNA gene identification. A de novo miRNA precursor prediction method was proposed. In constructing the recognition model, both primary sequence and secondary structure were combined into an input sequence through encoding, and the input space was mapped into a feature space via κ-gram method. After applying feature selection, those selected features was used to construct SVM-based models for the recognition ofmiRNA precursors. In the mean time, the method was compared with the HMM learning method. Experimental results show that the method outperforms HMM. The reason is that microRNAs are so short that it is not easy for HMM model to capture the signals for differentiating the genuine microRNAs from those pseudo-microRNA genes. From features selected, it was found that they are mostly come from the primary and secondary structure of microRNAs. This phenomenon may tell us to put more efforts in the mieroRNAs themselves in designing computational method before we fully understand the transcription mechanism of microRNA biologically.