对隧道内环境、交通状态等各类运营数据的实时、完整获取并深入挖掘,是提高应急处置能力、实现运营安全预警的基础.提出一种基于随机森林的缺失数据插补方法,根据缺失特征对缺失数据集进行分割;建立随机森林回归模型进行迭代插补并确定迭代终止条件;以标准均方根误差最小确定了随机森林中决策树的数量和分裂节点随机抽取变量数的最优组合.对公路隧道运营缺失数据集插补结果表明:本方法插补精度高、鲁棒性好,与KNN、SVD、MICE和PPCA等插补方法相比,标准均方根误差降低25%以上;利用并行运算大幅度提高了插补效率,弥补了插补速度慢的缺陷,保证了插补的有效性和时效性.
Real-time & completely accessing and deeply mining of tunnel operational data such as environment state and traffic status is a foundation work to improve emergency response capacity and realize safety early warning. An imputation method is proposed based on Random Forest algorithm. Missing data set is separated according to missing features. Random Forest regression model is built to iteratively impute after the determination of stopping criterion. The optimal combination of decision tree numbers and variables numbers randomly sampled at each split in Random Forest are identified by taking the minimum normalized root mean square error as objective function. Imputation results on highway tunnel operational missing data indicate that the method provides significantly higher precision and better robustness than KNN, SVD, MICE, PPCA, reducing normalized root mean square error by at least 25%. Moreover, the imputation efficiency is improved significantly by using parallel computation. It covers the shortage of slow imputation speed and provides a warranty of effectiveness and timeliness in missing data imputation.