针对冗余训练样本会降低BMA参数求解效率与精度问题,本文提出在BMA运算之前采用k-最近邻(k-nearest neighbor)算法筛选有价值训练样本,并用于BMA参数求解的改进模型。模拟试验在淮河王家坝站进行,分别以k-最近邻筛选、不筛选两种方案为BMA提供训练样本,统计分析两种方案中王家坝站流量模拟结果,评价BMA改进法的性能。模拟结果显示,采用k-最近邻样本筛选方法后,BMA模型对洪水过程以及洪峰的预报精度提升明显;概率预报结果的离散程度降低的同时,可靠性程度获得提升。k-最近邻样本筛选方法的引入,能够有效去除BMA模型训练样本中的冗余数据,以少量的样本获得更可靠的模型参数,改善集合预报性能。
The BMA(Bayesian model averaging) is a multi-model ensemble forecasting algorithm based onthe Bayesian formula to estimate the posterior probability distribution of forecasting variables. The perfor-mance of BMA depends largely on the quality of its training datasets. However, there are a lot of redun-dant samples, which are inconsistent with the current flow state and affect the accuracy and the reliabilityof BMA forecasts. In this study, the k-nearest neighbor(KNN) method is applied to address the similari-ties between the historical samples and the most recent flood process to reduce the influence of redundantsamples on the parameter estimation of BMA. Two cases of BMA,i.e. with the use of KNN sample selec-tion(namely KBMA) and the original one, are investigated and compared at the Wangjiaba catchment lo-cated in the upper region of the Huai River basin. The ensemble means of these two cases were examinedagainst the observations and the forecasts from their ensemble members to test the efficiency of their deter-ministic forecasts. Additionally, the probabilistic forecasts from these two cases were intercompared on thebasis of two assessment criteria including Coverage Rate and Ranked Probability Score. The results indicatethat the KBMA can produce improved deterministic and probabilistic forecasts as compared to the originalBMA. By employing the KNN sample selection method,the KBMA is able to adjust its parameters accord-ing to the real time state of the flood processes and ensemble members,rather than adjusting them throughthe use of all samples. Our analysis demonstrates that the KNN sample selection method has the potentialto substantially improve BMA ensemble forecasts.