摘要:虽然DCS系统可以实时地采集大量过程工业生产现场的操作数据,但仅有少量的数据可用作软测量模型的辅助变量样本,大部分反映质量指标的主导变量样本需要通过较长时间的人工分析或在线质量仪器获取,导致了软测量模型的训练样本收集困难,绝大多数DCS系统采集的数据无法有效利用,影响了机器学习的精度。针对以上问题,采用最大熵方法估计软测量模型变量的概率分布,结合聚类方法,对样本集中缺少人工分析值的部分采用贝叶斯极大后验估计方法进行补全。仿真结果证明:该方法可以对样本的缺失部分进行有效补全,从而增加可用样本的数量,提高模型的训练精度。
Although a DCS system can collect a large number of real-time operating data from a process industry production site, only a small amount of these data can be used as auxiliary variables sample in soft sensor. Most dominant variables reflecting the quality indicators had to be detected via manual analysis or on-line quality instrument for a long time. This not only brings the difficulty of collecting the training sample set for a soft sensor model, but also makes the most amounts of collected data via DCS systems not effectively utilized such that the accuracy of machine learning is affected. In this paper, a maximum entropy method is used to estimate the joint probability distribution of the variables for a soft sensor and a Bayesian maximum posteriori method integrating clustering analysis is applied to estimate the samples lacking manual analysis values. Simulation results show that the proposed method can effectively estimate the missing part of samples so that the numbers of samples can be added and the accuracy of the model training can be increased