蛋白质三维结构与功能的关系问题是当前生命科学领域的一个重大科学问题,蛋白质三维结构相似性比较则是探索蛋白质结构与功能关系的一种重要手段.文中就蛋白质结构在空间分布的特殊性提出了一种多准则框架下的蛋白质三维结构相似性检索模型.在该模型下,通过三种蛋白质空间旋转与平移不变特征的获取,实现了蛋白质骨架空间走向函数的一致性、蛋白质骨架距离直方图的一致性以及蛋白质距离矩阵数据分布一致性的多准则相似性检索策略.作者对实际的27804个蛋白质样本数据库进行实验,结果表明该文所提出的检索模式以及相似性准则的设计是实现大规模蛋白质三维结构相似性检索的一种有效方法.
The intrinsic relationship between the function of a protein and its structure is an important issue in the study of contemporary life science. Although the similarity comparisons of protein structures can provide some hints in such study, efficient retrieval of proteins based on 3D structure similarity is still a hard task due to the continually increasing large protein datasets. To overcome this difficulty, a multiple criteria framework (MCF) is proposed to reduce the computation cost. Three kinds of features, which are invariant against translation and rotation, are adopted as the criteria successively during the retrieval process under MCF, including the spatial walking of protein's backbone, distance histogram and the radial distribution of the distance matrix. While the protein retrieval based on each of the above features involves only simple calculation, the intersection of their retrieval results reduce the candidate set dramatically and rapidly. Experiments using query-by-example on a representative database, including 27804 samples, demonstrate that the techniques can cut down the pruning time cost of traditional methods effectively while retaining the sensitivity. The approach is highly complementary to rapid protein structure similarity retrieval.