顺式调控模块(Cis-regulatory module,CRM)在真核生物基因的转录调控中起着重要作用,识别顺式调控模块是当前计算生物学的一个重要课题.虽然当前有许多计算方法用于识别顺式调控模块,但识别准确率仍有待进一步提高.将顺式调控模块的多种特征信息结合在一起,有助于提高识别顺式调控模块的准确率.基于此,本文提出了一种识别顺式调控模块的算法Seg HMC(Segmental HMM model for discovery of cis-regulatory module).该算法建立了一种关于顺式调控模块识别问题的Segmental HMM模型,进一步扩展了顺式调控模块调控结构(或调控语法)的表示,不仅将顺式调控模块表示为模体(Motif)的组合,还进一步将模体共同出现的频率、模体顺序偏好以及顺式调控模块中相邻模体间的距离分布等特征引入到顺式调控模块的调控语法中.在模拟数据集和真实生物数据集上的实验结果表明,本文方法识别顺式调控模块的准确率显著优于当前的主要方法.
Cis-regulatory module (CRM) plays a key role in metazoan gene transcriptional regulation, and the discovery of cis-regulatory module has been a crucial research topic recently. Many computational methods have been proposed to predict the cis-regulatory module, but it is still a main task to further improve the prediction accuracy for cis-regulatory modules. Combining multiple features of eis-regulatory module together can improve the prediction accuracy for cis- regulatory module. Based on this, the paper presents an algorithm SegHMC (Segmental HMM model for discovery of cis-regulatory module) for the discovery of cis-regulatory module based on segmental HMM. The model further extends the representation of the structure of cis-regulatory module (or regulatory grammar), which not only describes a CRM as a combination of a group of motifs but also further introduces the frequency of the occurrence of motifs, the favour of the order of motifs, and the distance distribution between the adjacent motifs and other features. Experiments on the benchmark datasets demonstrate that the proposed algorithm outperforms the present main algorithms in the prediction accuracy.