简述了区间数据主成分分析(PCA)的两种主要方法-顶点法(V—PCA)和中点法(C—PCA),并对其进行了合理化改进。研究表明,两种方法的协方差矩阵有极大的相似性。在研究区间数距离的基础上,定义了一种基于Hausdorff距离的评价模型方法优劣的效度指标,并通过模拟的方法,对这两种方法进行了比较研究。结果表明:两种方法具有较强的相似性;随着变量数和样本数的增加,两种方法的效度均有所下降;在同样的样本数条件下。中点法适合变量数较大的情形,而顶点法更适合于变量数较小的情形。最后,给出了区间PCA方法选择及效度测量的应用步骤和一个算例。
The two main methods of principal component analysis (PCA) for interval data are Vertices- PCA and Centers-PCA. A review of these two methods was provided, and some modifying was made on them. Further study showed that the covariance matrices of them are quietly similar. In order to make a further comparison on the two methods, an index which can indicate the goodness of fit of some method was defined, based on the study on the distance between intervals by Hausdorff distance. Then, comparative study on the two methods was made by means of simulation. It is shown that the two methods are of quietly similarity. The goodness of fit of both methods becomes small along with the increasing number of the variables or number of the samples. While given a fixed number of the samples, C-PCA works better when the number of variables is large, whereas V-PCA performs better when the number of variables is small. Finally, the steps of choosing a suitable PCA method and measuring the goodness of fit are given with an example.