非负矩阵分解(Nonnegative Matrix Factorization,NMF)是一种新近被提出的方法,它以非线性的方式实现对非负多元数据的纯加性、局部化、线性和低维描述。NMF可使数据中的潜在结构、特征或模式变得清晰,因此它作为一种有效的特征提取手段已被成功应用在许多领域的研究中。但是,NMF的处理对象本质上是向量,用NMF处理数据矩阵集时要先将被处理矩阵集中的矩阵逐一矢量化,这常使对应的学习问题成为典型的小样本问题,从而使NMF结果的描述力不强、推广性差。为克服这两个问题,并保留NMF的好的特性,该文提出了非负矩阵集分解(Nonnegative Matrix—Set Factorization,NMSF),不同于NMF处理数据矩阵的矢量化结果,NMSF直接处理数据矩阵本身。理论分析显示:处理数据矩阵集时,NMSF会比NMF描述力强、推广性好。为了说明NMSF如何实现,也为了能对NMSF的性能做实验验证,构造了NMSF实现方式之一的基于双线性型的NMSF(Bilinear Form-Based NMSF,BFBNMSF)算法。BFBNMSF和NMF的比较实验结果支持了理论分析的结论。需要指出,更佳的描述力和更好的推广性意味着NMSF比NMF更善于抓住数据矩阵的本质特征。
Nonnegative Matrix Factorization (NMF) is a recently developed technique for nonlinearly finding purely additive, parts-based, linear, and low-dimension representations of nonnegative multivariate data to consequently reveal the latent structure, feature or pattern in the data. Although NMF has been successfully applied to several research fields, it is confronted with two main problems (unsatisfactory accuracy and bad generality) while the processed is a matrix-set, because the processed objects of NMF are intrinsically vectors and the necessary vectorization for every matrix in the processed matrix-set often make corresponding NMF learning to be typical small-sample learning. In this paper, Nonnegative Matrix-Set Factorization (NMSF) is conceived to overcome the problems and to retain NMF's good properties. As opposed to NMF, NMSF directly processes original data matrices rather than vectorization results of them. Theoretical analysis shows that while processing a data matrix-set, NMSF should be more accurate and has better generality than NMF. To show how to implement NMSF, and to validate NMSF's properties by experiments, Bilinear Form-Based NMSF (BFBNMSF) algorithm, as an implementation mode of NMSF, is formulated. Results of comparison experiments between BFBNMSF and NMF stably support the theoretical analysis. It is worth noting that higher accuracy and better generality actually means that NMSF is better at extracting essential features of data matrices than NMF.