长非编码RNA(lncRNA)与蛋白质编码基因在转录过程中具有相似性,仅通过测序方法难以有效识别.针对这一问题,以高可靠性的GENCODE基因注释数据为基础,通过生物信息学方法在全基因组范围对人和小鼠这2种哺乳动物的lncRNA基因和蛋白质编码基因的特征进行分析研究.结果表明,相对于蛋白质编码基因,lncRNA在基因结构、序列构成、编码蛋白质能力、保守性等方面具有显著特征,这为其准确识别提供了信息.
Long non-coding RNA(lncRNA)and protein-coding genes have several similarities during transcription,which makes it difficult to identify the lncRNAs efficiently just using sequencing methods.Focus on this problem,based on the high-confident GENCODE gene annotation data,the lncRNA and protein-coding genes of human and mouse on the genome scale were analyzed using bioinformatic methods.Experiment results show that lncRNAs represent significant characteristics in several aspects including gene structure,sequence composition,protein-coding potential and conservation,which provided information for predicting the lncRNAs accurately.