根据核小体定位序列和缺失序列的碱基分布特征,应用多样性增量二次判别方法(IDQD)构建模型对这两类序列进行了区分,受试者操作特性曲线下的面积达到了0.958.应用这一模型研究了核小体在人类基因组剪接位点(GT/AG)邻近序列中的分布方式,发现外显子所对应的DNA序列通常倾向参与核小体的形成,并且由它所转录的RNA统计上具有较强的刚性,而剪接位点及其邻近的内含子对应的DNA序列则避免参与核小体的形成,所转录的RNA统计上具有较强的柔性.进一步还发现,DNA序列的核小体定位/缺失和RNA的刚性/柔性具有统计相关性,为从机制上解释为何前体RNA剪接事件与DNA序列中的核小体定位信息有关提供了依据.
Based on the characteristic of nucleotide distribution in nucleosome positioning and inhibiting sequences, the method of Increment of Diversity with Quadratic Discriminant (IDQD) was applied to the classification of these two types of sequences. The mean area under ROC curve archives 0.958. By using this model, the nucleosome formation potential was analyzed in the regions around the splice sites (GT/AG). The results show that coding regions have a high potential to form the nucleosome and the primary RNA transcripts are rigid, while DNA sequences corresponding to the splice sites and their adjacent intron regions tend to be nucleosome free and the primary transcripts from these regions are relative flexible. Moreover, the negative correlation between nucleosome positioning/inhibiting of DNA sequences and RNA flexibility/rigidity is demonstrated around the splice sites, providing a mechanism for understanding the correlation between the nucleosome positioning of DNA and the splicing of transcribed RNA sequences.