提出以熵为计算基础的数据污染率估算方法,避开了传统粗差判别的限差取值问题。数据主体分布模式已知时,利用样本数据估算总体熵和已知主体分布信息估算主体熵获取数据污染前后的熵变化量,通过熵变化率估算数据污染率;数据主体分布模式未知时,通过熵系数计算逼近获取数据主体分布信息,再以熵变化率估算数据污染率。结合熵计算中的截断误差,分析了对污染率的估算影响,数值实验显示,熵计算的截断误差对污染率的估算影响微小,当截断误差达到0.01时,对污染率的估算影响为1%。算例表明基于熵的污染率估算方法有效、可靠。
An estimation method of contamination rate based on entropy was proposed. It is useful for gross er ror statistic to avoid limited error selection. Two models of data main distribution were suggested to investigate con tamination rate and the estimation methods of contamination rate based on entropy were given out. A numerical sim ulation was performed to analyze the influence of entropy truncation error on data contamination rate estimation. It is less influence for entropy truncation error to contamination rate estimation based on entropy. When truncation error is O. O1, the variation of contamination rate estimate is only 1%. The examples show that the estimation method of contamination rate based on entropy is reliable and superior to the traditional estimation.