非负矩阵分解算法(Nonnegative Matrix Factorization Algorithm,NMF)已经广泛地应用于诸多领域,但它容易受到异常点的影响.各种针对这个问题的改进方法中,使用L2,1范数的鲁棒非负矩阵算法(Robust Nonnegative Matrix Factorization Algorithm,RNMF)取得了较好的改进效果,但是该算法不能很好的适应数据集异常点比例的变化.针对这一缺点,提出了截断式鲁棒非负矩阵分解算法(Capped Robust Nonnegative Matrix Factorization Algorithm,CRNMF),将去噪比例ε值引入到目标函数中,降低异常点对整体算法的影响.该算法的主要步骤是:在矩阵分解迭代更新的每一步中,计算输入数据与分解因子重构值之间的误差,将误差大于预先设定参数值ε的数据点对应的误差截断为零,重复以上步骤直到收敛.通过ε截断操作,降低基矩阵F和系数矩阵G受异常点的影响.给出了CRNMF的算法描述,并且在模拟数据集和真实数据集进行了实验,实验表明提出的算法与传统的NMF和RNMF相比,可以在一定程度上提高聚类的准确度,减少了异常点对聚类准确度的影响,提高了算法的鲁棒性.
Nonnegative Matrix Factorization Algorithm(NMF)has been widely applied in various areas,but it is easily influenced by outliers.In order to solve this problem,researchers have proposed Robust Nonnegative Matrix Factorization Algorithm(RNMF),which uses L2,1norm to make the normal points be approximate as much as possible and reduce the residual of the outliers by using its absolute rather than square.However,RNMF is still sensitive to the proportion of outliers,i.e.,in some datasets,RNMF can handle outliers well,but its performance in other datasets is not satisfactory.Every real dataset has its own structure,that is,it contains a different proportion of outliers.Because of this,RNMF is limited in practical application.In this article,we present a Capped Robust Nonnegative Matrix Factorization Algorithm(CRNMF)by adding a denoising rateεinto the objective function of RNMF.To achieve better controlling of the outliers,we use this algorithm to handle the situation which the real dataset outlier ratio is differ-ent.The main idea of CRNMF is evaluating the residual for each data point according to the input data and the factors during the iterative procedure,if the residual is larger than the given denoising rateε,we will set the residual as0,i.e.,the corresponding data point is taken as outlier and not considered in the computing process.By introducingεtruncation,the algorithm reduced the influence of outliers on matrix Fand matrix G.This paper gives the description of CRNMF and experiments on real world and synthetic data sets.Experimental results show that the proposed algorithm can improve the clustering accuracy,reduce the impact of outliers and then improve the robustness of the algorithm to some extent,compared with the traditional NMF and RNMF.