目的 探讨基于随机森林(RF)回归估计因果关系网络的效果。方法 通过模拟实验设定因果关系网络,对数据标准化后,利用全条件RF回归对其进行估计并评价其准确性。另外将该方法用于卵巢癌基因表达谱数据,并对分析结果进行验证。结果 模拟实验结果表明RF回归对于预先设定网络关系的识别能力明显优于贝叶斯网络方法。当选择合适的阈值时,随着样本含量的增加基于随机森林回归方法构建的网络准确性不断提高,但传统经典的贝叶斯方法效果基本保持不变;实例分析结果验证,基于RF回归方法能够得到与现有数据库的网络结构。结论 应用基于RF回归方法估计的网络,能够在样本量较少的情况下得出准确度较高的网络。
Objective To investigate the performance of network reconstruction based on random forest regression. Methods Simulation studies were performed to evaluate the accuracy for network reconstruction with standardized data and conditional random forest regression. Results Simulation studies demonstrated that the network reconstruction performance with random forest regression is better than that with Bayesian network. In particular, when the thresholds are selected appropri- ately, the performance for network reconstruction based on random forest regression could improve with the increase of sample size while the traditional Bayesian network will remain stable. Besides, we applied this approach to the real example and achieved satisfactory performance. Conclusion The proposed method in this paper could achieve satisfactory performance for network reconstruction in small sample size.