从计算的角度出发,考虑到当前蛋白质和RNA相互作用数据的实际情况,基于蛋白质和RNA的序列成分信息,构建预测模型.通过来自PDB的3149个蛋白质-RNA相互作用对得到的氨基酸三联体-核苷酸的相互作用倾向性值,定义了一个权重倾向性度量,用来度量每一对蛋白质-RNA序列中三联体-核苷酸的相互作用倾向性.为了避免特征的冗余性,基于较高倾向性以及成分特征构建特征向量,用于预测蛋白质-RNA相互作用.计算结果显示文中预测模型(SVM模型)和算法的有效性.
Considering the limitations of present protein-RNA interaction data,a prediction model is constructed based only on sequence information. Utilizing 3149 protein-RNA interaction pairs in PDB,the interaction propensity of amino acid triplets and nucleotide acids are computed. A weighted interaction propensity measure is defined to compute the interaction propensity of amino acid triplets nucleotide acid for each pair. In order to avoid the feature redundancy,the amino acid triplet-nucleotide acid combinations with higher propensity and composition-based features are selected to build up feature vectors,and predict protein-RNA interaction. The computational results prove the effectiveness of the prediction model( SVM model) and algorithm.