针对符号化聚合近似算法(SAX)中时间序列必须等长分割的缺陷,提出一种基于分割模式的时间序列符号化算法(SMSAX)。利用三角阈值法对随机抽样的时间序列进行特征提取,计算时间序列最大压缩比,将其作为时间窗宽提取分割点,进而求出时间序列的分割模式。利用得到的分割模式对时间序列进行分割降维,通过均值和波动率对分割后的子序列进行向量符号化。根据时间序列特征对其进行不等长分割,并加入波动率消除奇异点的影响。实验结果表明,SMSAX能获得比SAX更精确的结果。
Aiming at defects of equal-length segmentation of time series in symbolic aggregate approximation algorithm(SAX), a vector symbolic algorithm based on segmentation algorithm for time series(SMSAX) is presented. A triangular threshold method is used to extract features of time series which is sampled randomly. The time series maximum compression ratio is calculated as the time window width to extract segmentation points, and further the Segment Mode(SM) of the time series is found. The partition model is used to segment time series to reduce the dimensionality of them by using vector of mean and volatility of sub-sequences to symbolic them. The algorithm segments time sequences based on characters of them, and eliminates the impact of singular points with the fluctuation rate. Experimental results indicate that SMSAX is able to obtain more accurate results than SAX.