将符号时间序列分析方法与K—NN(K—Nearest Neighbors)算法相结合,提出了一种基于符号时间序列直方图的高频金融波动整体分布的预测方法。首先将时间序列符号化得到符号时间序列,并以符号序列直方图表示符号序列的分布,引入符号直方图时间序列的概念,采用K—NN算法得到下一个周期符号序列直方图的预测。在K—NN算法中,针对符号序列直方图的特点,提出以欧几里得范数,x^2统计量和相对熵作为选择邻居时的符号直方图序列相似度的度量方法,利用系统自身的几何特性确定符号直方图序列的嵌入维数。以上证综指5分时的高频数据检验了本文方法的预测能力。结果表明,本文方法预测所得结果整体误差均在可以接受的范围内,预测所得的分布与真实分布均值相同,但是方差较小。
We propose a new method combining symbolic time series analysis and K-Nearest Neighbors (K-NN) algorithm to forecast high frequency financial volatility based on symbolic time series histogram. The original time series is transformed into symbolic time series and the histogram of symbolic series is used to represent its overall distribution. The concept of symbolic histogram time series is introduced and the K-NN algorithm is used to get the next period forecasting result of symbolic series histogram. In the K-NN algorithm, the Euclidean norm, x^2 statistics and the relative entropy are proposed to be the measurement of similarity between two symbolic histogram time series according to the characteristics of symbolic series histogram. The geometrical property of the system itself is used to determine the embedding dimension of the symbolic histogram series. The forecasting ability of the method proposed is tested by Shanghai Composite Index high frequency data with 5-minute sampling period. The results indicate that the forecasting errors are all acceptable. The forecasting distributions have the same mean and smaller variance compared with the real distributions.