作为基因功能预测的主要手段,序列相似性查询技术是生物信息学领域的研究热点.基因序列和结构的相似性往往决定了基因功能的相似性,因此可以通过基因序列的相似性查找来预测新基因的功能.分析了MRS索引中频率变化和小波变换等相关技术,讨论了它们的缺点和不足,提出了一种基于二分频率变换2-PFT的序列相似性查询处理技术.首先,设计了二分频率变换和相应的距离函数,使得系统较之频率变换和小波变换具有更高的过滤能力,极大地提高了系统的性能;其次,解决了处理任意长度查询的问题.理论证明和实验结果均表明,2-PFT系统的性能远远优于MRS系统.
As a main method for predicting the functionality of genes, the sequence similarity querying technique is becoming one of the research hotspots in bioinformatics. The similarity of gene sequence a nd structure usually determines the similarity of gene functionality, and the function of an unknown gene can be predicted by sequence similarity querying. After analyzing the advantages and shortcomings of related work such as frequency transformation and wavelet transformation used in MRS, a new sequence similarity query processing technique based on the two-Partitioning Frequency Transformation 2 -PFT is proposed. Firstly, the Two -partitioning frequency transformation and the corresponding distance functi on are designed. They have a higher filtering ability than frequency transformation and wavelet transformation, and the system performance is thus improved significantly. Secondly, the problem of processing the queries with any length is solved. Theoretica 1 proof and experimental results show that the 2-PFT system outperforms the MRS system greatly.