针对化工过程数据中存在缺失数据的问题,在保持局部数据结构特征的基础上提出了基于局部加权重构的化工过程数据恢复算法。通过定位缺失的数据点并以符号Na N(Not a Number)标记,将缺失的数据集分为完备数据集和不完备数据集。不完备的数据集按照完整性的大小依次找到它们在完备数据集中相应的k个近邻,根据误差平方和最小的原则,求出k个近邻相应的权值,用k个近邻及相应的权值重构出缺失的数据点。将该算法应用在不同缺失率下的两种化工过程数据中并与望最大化主成分分析(EM-PCA)法和平均值(MA)两种传统的数据恢复算法相比较,该算法的恢复数据误差最小,并且计算速度相比EM-PCA算法平均提高了2倍。实验结果表明,局部加权重构的化工过程数据恢复算法可以有效地对数据进行恢复,提高了数据的利用率,适用于非线性化工过程缺失数据的恢复。
According to phenomenon of missing data in the chemical process, a Locally Weighted Recovery Algorithm( LWRA) for dealing with missing data in the chemical process was proposed based on preserving the local data structure characteristic. The missing data points were located and marked with the symbol Na N( Not a Number), the missing data set was divided into complete data set and incomplete data set. The corresponding k nearest neighbors of incomplete data set were found in the complete data according to the size of integrity in turn, and the corresponding weights of k nearest neighbors were calculated according to the principle of minimum error sum of squares. Finally, the missing data points were reconstructed by k nearest neighbors and their corresponding weights. The algorithm was applied into two types of chemical process data with different missing rates and compared with two traditional data recovery algorithms, Expectation Maximization Principal Component Analysis( EM-PCA) and Mean Algorithm( MA). The results reveal that the proposed method has the lowest error,and the computation speed increases by 2 times in average than EM-PCA. The experimental results demonstrate that the proposed algorithm can not only recover data efficiently but also improve the utilization rate of the data, and it's suitable for nonlinear chemical process data recovery.