考虑样本和输入变量的选取对预测模型精度的影响,文章提出一种基于K-means聚类与偏最小二乘法的支持向量机PM(2.5)浓度预测方法。首先采用K-means算法对气象属性进行聚类,间接把PM(2.5)序列分成了相似度较高的若干类,并分别作为预测建模用的训练样本;然后采用偏最小二乘法从影响PM(2.5)浓度的多种因素中提取主成分,作为各类模型的优化输入;最后根据预测日的气象属性选出合适类别,运用优化后的训练样本和输入变量建立PM(2.5)浓度预测模型。以北京市某监测点的实际数据为例,运用改进模型和传统模型分别进行实验。结果表明:改进的支持向量机相比传统支持向量机在预测精度上有明显的提高,精度评价指标MAE、MAPE和RMSE分别下降38.10%、50.59%、37.15%。研究实证,引入K-means聚类与偏最小二乘法的手段来提高传统支持向量机在PM(2.5)浓度预测中的精度具有可行性。
Considering the influence of the sample and input variables on the accuracy for the predication model, an integrat- ed support vector machine approach combining K-means clustering and partial least square is proposed. Firstly, the PMzs se- quence is divided into several classes based on the meteorological attributes clusters using the K-mean algorithm, which is used for modeling. Then the main components is extracted for a variety of factors affecting PM2.5 concentration using the partial least square, which is used for modeling. Finally, matching model is selected based on optimized training set and input to carry to PM2.5 concentration prediction. Taking a monitoring point in Beijing as an example, experiments were conducted to predict the PM2.5 concentration by using improved model and traditional support vector machine model. Results showed that the improved model outperform the traditional model. MAE, MAE and RMSE decreased by 38.10%, 50.59% and 37.15% respectively. It can be concluded that introducing K-means clustering and partial least square is a feasible and effective way to improve the accuracy of traditional support vector machine model in PM2.5 concentration prediction.