剪接是基因表达过程中连接转录和翻译的中枢步骤,是一个高度调控的过程。剪接位点是基因剪接过程中的核心调控元件。本文通过挖掘剪接位点序列中蕴含的序列特征,提出了一个基于序列模式挖掘的基因剪接位点序列打分模型。通过该模型,实现对剪接位点序列信号强度的定量度量。实验结果表明,该模型可有效分类真假剪接位点序列,分类效果优于最大信息熵模型,模型具有良好的鲁棒性,并且可有效识别致病剪接位点序列突变。
Gene splicing as a tightly regulated process,is a pivotal process between transcription and translation during gene expression.Splice sites are the kernel regulatory elements for gene splicing.Here,based on the sequential features minded from splice site sequences,we develop a score system for splice site sequences.Through this score system,splice site sequence can be measured quantitatively.The experimental results show that the canonical and pseudo splice site sequences can be discriminated effectively.Moreover,this model outperforms the maximum information entropy model with a great robustness,and the pathogenic splice site sequence mutations can be detected efficiently by the model.