基因调控网络的重构可以挖掘出基因之间潜在的调控作用关系,帮助我们更深刻地理解复杂的调控机制,在系统生物学中,这已经成为人们研究的一个热点问题。现在已经产生了大量推断网络构建的模拟理论和计算方法,其中一些信息论方法在计算中容易遗漏一些调控关系,这些调控关系可能代表基因间真实的调控关系。为了克服这个缺点和进一步提高网络重构的精度,本文提出了一种新的基因调控网络构建方法,它通过重采样和加权平均策略,利用条件互信息来进行基因调控网络重构。算法首先从基因表达数据中重采样得到一系列子数据,然后通过条件互信息方法构建一系列子数据的网络,最后利用这些子网络加权平均得到最终的基因调控网络。本文的算法在DREAM3模拟数据和s0S真实网络数据上进行验证,和其它一些流行的方法相比,本文提出的网络重构算法取得了较高的精确度和准确度。
Reconstruction of gene regulatory networks (GRNs) from large-scale expression data can mine the potential causality relationship among the genes and help understand the complex regulatory mechanisms. It is of utmost interest and has become a challenging computational problem for understanding the complex regulatory mechanisms in cellular systems. For the past decades, numerous theoretical and computational approaches have been introduced for inferring the GRNs. However, all existing methods of inferring GRNs from gene expression profiles have their strengths and weaknesses. In particular, many properties of GRNs, such as topology sparseness and non-linear dependence, are generally in regulation mechanism but are seldom taken into account simultaneously in one computational method. Some information theory algorithms do not recover the true positive edges that may have been deleted in an earlier computing process. These interaction relationships may reflect the actual relationship of genes. To over- come these disadvantages and to further enhance the precision and robustness of inferred GRNs, we presented an ensemble method, to infer GRNs from gene expression data by adopting two strategies of resampling and arithmetic mean fusion in this work. In this algorithm, the jackknife resampling procedure was first employed to form a series of sub-datasets of gene expression data, then the conditional mutual information was used to generate the correspond- ing sub-networks from the sub-datasets, and the final GRN was inferred by integrating these sub-networks with an arithmetic mean fusion strategy. Compared with those of the state-of-the-art algorithm on the benchmark synthetic GRNs datasets from the DREAM3 challenge and a real SOS DNA repair network, the results show that our method outperforms significantly LP, LASSO and ARANCE methods, and has a high and robust performance.