符号数据分析是一种新兴的数据挖掘技术,区间数是最常用的一种符号数据。研究应用区间型符号数据的PCA方法来评价股票的市场综合表现问题。首先介绍了符号数据分析的基本理论。接下来研究了区间数据样本的经验描述统计量的计算,并基于经验相关矩阵,给出了区间主成分分析的算法,该算法最终得到区间数表达形式的主成分取值。最后选取上海证券交易市场20支股票在某一周上的交易数据,进行了实证研究,基于区间主成分得分的矩形图表示,将20支股票按其市场综合表现分成了四类。
Symbolic data analysis is a new data mining technology and interval number is a most important type of symbolic data. Integrated evaluation on stocks' integrated behavior in the market is studied by principal component analysis (PCA) for interval-valued symbolic data. The basic theory of symbolic data analysis is introduced. Empirical descriptive statistics for interval data is studied. Based on the empirical correlation matrix, an arithmetic of PCA for interval data is put forward which gives interval principal components values. Finally, an empirical research on twenty stocks' transaction data in a certain week of Shanghai financial market is done. Based on rectangle show of the interval-valued principal components, the 20 stocks are classified into four groups according to their integrated behavior in the market.