针对K最近邻填充算法(K-nearest neighbor imputation,KNNI)的参数K值固定问题进行了研究,发现对缺失值填充时,参数K值固定很大程度上影响了填充效果。为此,提出了基于稀疏编码的最近邻填充算法来解决这一问题。该算法是用训练样本重构每一缺失样本,在重构过程中充分考虑了样本之间的相关性;并用1范数来学习确保每个缺失样本用不同数目的训练样本填充,以此解决KNNI算法参数K值选取问题。基于数据性能分析指标RMSE和相关系数的实验比较结果表明,该算法比KNNI算法的效果要好。该算法能很好地避免了KNNI算法存在的缺陷,适用于数据预处理环节需要对缺失值进行填充的应用领域。
Aimed at the parameter K fixed issues of K-nearest neighbor imputation (KNNI) algorithm, it was found that when impute the missing values, the fixed value of the parameter K resuhed in a large extent influence to the imputation effect. Therefore, this paper proposed the K-nearest neighbor based on sparse coding (KNNI-SC) algorithm to solve this problem. This method reconstructed each missing sample with the training samples, fully considering the correlation between samples in the reconstruction process. And it used an l1 norm to learn to ensure each missing sample was imputed by different number of training samples, so it solved the parameter K selection problem of KNNI algorithm. Performance comparison based on the data analysis of the experimental results indicators RMSE and correlation coefficients show that the algorithm is better than KNNI algorithm. The algorithm can well avoid the defects of KNNI algorithm, it is available to data preprocessing step that needs missing values imputation' s applications.