关于区间数据的主成分分析(PCA)方法已取得了较丰富的研究成果,但少见对这些方法效度的评价研究.针对该问题,基于Hausdorff距离用于定义两个紧集之间距离的考虑,将区间数视为一个紧集,定义了区间数之间的距离,并研究了区间矩阵的距离.在此基础上,根据PCA方法的原理,建立了一个区间PCA方法的效度评价指标.该指标取值在0与1之间,其取值越大,说明区间PCA方法效度越高,反之则效度越小.最后,采用模拟的方法,分别选取均匀分布和正态分布两种类型的区间数据样本,对目前最常用的两种区间PCA方法——顶点法和中点法进行了效度分析,验证了文中所提的效度指标的正确性.
Many achievements about the methods of principal component analysis (PCA) for interval data have been got. While the evaluation of the methods' validity is seldom done. Aimed at this problem, seeing an interval datum as a compact set, the distance between two interval numbers is defined based on Hausdorff distance. Furthermore, the distance between two interval matrices was studied. Based on this, according to the theory of PCA,a validity index of PCA for interval data was constructed. The index is obverse whose value is between 0 and 1. The bigger the index is, the higher the validity of the interval PCA method is. Vertices-PCA and eenters-PCA are the two main methods of PCA for interval data. Validity study of the two methods was made by means of simulation, where uniform and normal data were generated and used. The correctness of the validity index defined in the paper is demonstrated by the simulation.