序列分类方法被广泛应用于各种生物信息学问题,例如转录调控元件识别和蛋白结构预测。本研究设计了一个新的基于序列特征的分类方法,并将其用于RNA剪接调控元件的研究。该方法从已知剪接元件中抽取序列特征,构建一个打分算法,由此预测未知元件RNA剪接调控功能。作为应用实例,采用已知外显子剪接增强子和沉默子(ESE和ESS)八联体作为实验数据,对本方法和若干已知常用方法的预测结果进行比较,3类计算验证实验中的平均预测精度为93%,表现出良好预测精度,且其透明的预测结构可帮助进行生物解释。该研究提供了一种可用于分析生物序列数据的新方法,给出了一个从生物信息学角度来研究基因调控问题的新途径。
The sequence classification methods have broad application in various bioinformatics areas such as the identification of regulatory elements of transcription and the prediction of protein structure.Here we presented a new classification method to analyze short sequences based on their sequential features,and used this method to study RNA splicing regulatory elements.This method extracted the sequential features from the known spicing regulatory elements,and developed a scoring system to evaluate how possible a given short sequence can regulate RNA splicing.This method was compared with some other methods through applying to a set of exonic splicing enhancer(ESE) and silencer(ESS) octamers.The average prediction accuracy of this sequential feature-based method for three kinds of computation validation experiments reached about 93% and the transparent predictive structure of the method helps to interpret the biological mechanism.This paper shows a new method for biology series' data analysis and presents a new way for the study of regulatory sequences that control gene expression.