典型相关性分析(canonical correlation analysis,CCA)是一种用来分析2组随机变量之间相关性的统计分析工具,但作为一种线性数学模型,CCA不足以揭示真实世界中大量存在的非线性相关现象.采用局部化的方法,在概率典型相关性分析(probabilistic CCA,PCCA)的基础上,使用概率混合模型框架,提出了混合概率典型相关性分析模型(mixture of probabilistic CCA,MixPCCA)以及估计模型参数的2阶段期望最大化(expectation maximization,EM)算法,并给出了使用聚类融合确定局部线性模型数量的方法和MixPCCA模型应用于模式识别的理论框架.在手写体数据集USPS和MNIST上的实验证明,MixPCCA模型通过混合多个局部线性PCCA模型不仅提供了一种捕捉复杂的全局非线性相关性的解决方案,而且还具备检测只在局部区域才存在的相关性的能力.
Canonical correlation analysis(CCA)is a statistical analysis tool,which is used to analyze the correlation between two sets of random variables.A critical limitation of CCA is that it can only detect linear correlation between the two domains that is globally valid throughout both data sets.It is not enough to reveal the large amount of non-linear correlation phenomenon in the real world.To address this limitation,there are three main ways:kernel mapping,neural network and the method of localization.In this paper,a mixture model of local linear probabilistic canonical correlation analysis(PCCA)called MixPCCA is constructed based on the idea of localization,and a two-stage EM algorithm is proposed to estimate the model parameters.How to determine the number of local linear models is a fundamental issue to be addressed.We solve this problem by the framework of cluster ensembles.In addition,the theoretical framework of MixPCCA model applied in pattern recognition is put forward.The results on both USPS and MNIST handwritten image datasets demonstrate that the proposed MixPCCA model not only provides a solution to capture the complex global non-linear correlation,but also has the ability of detecting correlation which only exist in the local area,which traditional CCA or PCCA fails to discover.