水文时间序列数据中蕴藏着自然演变的规律和人类活动对下垫面影响的信息。通过序列模式挖掘技术发现这些时空序列中蕴藏的洪水频率、水文情势突变等物理规律能够为水文预测预报、防汛调度等提供辅助决策支持。模体是指在一组序列中重复出现的相似片段模式。时间序列模体数据挖掘就是利用数据挖掘思想,在时间序列中找出重复出现的相似片段的过程。本文针对水文时间序列的特点以及对洪水和旱情的挖掘需求,提出基于小波变换、极值点分解和符号化的模体挖掘方法GSB—VLMD(Grammars & Semantics Based—Variable Length Motifs Discovery)。其中小波变换负责对数据去噪,使处理后的数据变得更加平滑;极值点分解负责从平滑数据中提取洪水和干旱等极值语义信息;符号化负责离散化数据,为模体挖掘Sequitur算法提供输入。以太湖近50年水位序列作为源数据,使用该方法对其进行模体挖掘,结果证明了其正确性和实用性。
There is a lot of information about hidden nature evolution law and the influences of human beings on the earth surface in long sequence of hydrological time series. Data mining techniques can help find those hidden laws, such as flood frequency and abrupt change, which are useful for the decision support of hydrological prediction and flood control scheduling. Motif is a word from biology, which represents the repeated occurring patterns in time series. Hence, time series motif mining is to find repeated patterns in a time-series using the methodology of data mining. Aiming at the characteristics of hydrological time series and special needs of flood and drought knowledge discovery, a novel motif mining approach named GSB-VLMD (Grammars & Semantics Based-Variable Length Motifs Discovery) is proposed. It is based on wavelet transform, extreme points decomposition and symbolization. Wavelet transform is responsible for the reduction of noise and smoothing; extreme points decomposition is capable of extracting the semantic information about flood and drought, and symbolization will prepare the discrete input for the following the Sequitur algorithm, which performs the motif mining process. Experiments on the real water level time series of Taihu Lake as source data are carried out, which has proved the validity and the utility of the novel approach proposed in this paper.