采用信号处理技术来识别DNA碱基序列中的基因片段的方法,已经成为一种重要的基因识别途径,重新编码的DNA序列存在大量噪声信息,使得目前很多识别算法无法准确的识别外显子片段的起始位置。本研究通过对“固定长度滑动窗口-频谱曲线法”和“移动序列-信噪比法”的实现与改进,提出了一种基于变动窗口和移动序列的基因识别算法。首先,对已有基因识别算法进行编程实现;采用小波分析对识别结果进行消噪处理;探讨识别最优固定长度M的选择,提出基于变动窗口和移动序列的基因预测模型,并编程实现。最后使用该模型对已有基因序列进行识别,其识别准确度达到77.57%。
It is believed that signal processing and analysis method has already been one of the most important ways to identify the gene coding sequences in DNA base sequences. Because of the random noise in the recoding DNA sequence, it is difficult to accurately determine the starting position of gene exon interval by present recognition algorithms. After achieving and improving the algorithms of "Spectral Rotation Measure" and "Different Starting Points", this paper proposed a genetic recognition algorithm based on multi-scale and mobile sequence. First, we implemented the existing gene identification algorithm; Second, wavelet analysis was used for de-noising the identification results; Third, we identified the optimal fixed length M, built a new gene prediction model based on spectral rotation measure and different starting points, and programed the algorithms. Finally, we used the model to predict the existing gene sequences, and the recognition accuracy reached 77.57%.