准确的数据预处理是实现正确有效的数据挖掘和基于数据样本进行系统建模的前提和关键环节.数据预处理的一项重要任务是从大量数据样本中剔除异常样本或对受污染的样本进行清洗修复,但在数据集中各数据项间关系未知的情况下。检测异常样本比较困难,为此,文章提出了一种基于小波分析的异常数据样本检测与修复方法.该方法充分利用了小波分析的多尺度、多分辨特性,能有效地实现异常数据样本的准确检测和修复.为了实现离散序列小波变换的快速计算,文章还提出了一种基于Newton-Cores公式的修正的数值积分算法.仿真结果表明,文章提出的方法切实可行,效果盘好,有很强的实用性.
Appropriate data preprocessing is the precondition to perform data mining or system modeling based on data set. It is an important to eliminate or amend the anomalous samples in data set which have been polluted. When the relationship among the samples' attributes is unknown, it' s difficult to detect the anomalous samples. In this paper, an approach based on wavelet analysis for detecting and amending anomalous samples is proposed, which is able to detect and amend anomalous samples accurately because it takes full advantage of wavelet analysis' character of multiple scale. To realize the rapid numeric computation of wavelet translation for a discrete sequence, a modificatory algorithm based on Newton-Cores formula is proposed. The experiments show that the approach is accurate and practical.