生物学探究的基因关联是类似于因果关系的本质联系,要解决的关键问题是寻找一种可以描述本质联系的方法。针对Dialogue for Reverse Engineering Assessmentsand Methods第3次竞赛项目(DREAM3)中的大肠杆菌(E.coli)基因调控网络结构辨识问题,提出一种基于再生核希尔伯特空间(RKHS)的统计独立性度量方法——Hilbert-Sehmidt独立性准则(HSIC)。此方法是一种基于分布的非参数独立性度量方法,并不要求数据符合某种特定分布,不以分类率、模型简单度等外部条件作为约束条件,同时非参数定量地描述变量之间的联系程度。对大肠杆菌基因表达数据的实验结果显示,尽管数据集中的时间序列数据样本很小,并且只提供了较弱的和类型复杂的调控信息,但HSIC方法仍能较好地辨识出这种较为隐含且复杂的调控关系。对比计算显示,在3种数据规模下,采用HSIC方法辨识结果的AUROC值高于Granger Causality(GC)方法23个百分点,高于参与此竞赛的第1名3.9个百分点,而且在计算效率上亦高出其所使用的微分方程法3个数量级。
The key of genetic system modeling is to identify the causal relationships of the genes. In the third Dialogue for Reverse Engineering Assessments and Methods (DREAM3)competition, E. coli dataset was generated with a ' true' biological gene networks. The aim of this work is to recover gene network structure from the data. Here we presented a statistical independent measurement method based on reproducing kernel Hilbert space (RKHS) - Hilbert-Schmidt independence criteria (HSIC). Different from others, which either use the classification rate, or parameterized methods,the proposed measurement is a non-parametric direct measurement with independence. Comparative experiment results showed that the method was efficient in recovering the regulatory relationships between genes even with small data sample. Specifically, the HS1C achieved a better result than the classical Granger Causality ( GC ) method as well as the differential equations based method, which was the best in DREAM3 contest. The AUROC values obtained by HSIC is 23 percent higher than GC method, and 3.9 percent higher than the best performer of this contest. In addition, the computational efficiency of HSIC method was 3 orders higher than differential equations based method.