时间序列的特征表示与相似性度量是时间序列数据挖掘的重要基础。针对现有的序列表示方法难以具体反映序列的形态变化趋势,导致相似度量结果不精确的问题,提出一种新的基于形态模式的相似性度量算法。该算法在分段线性表示的基础上,根据序列在不同时段的斜率变化情况,划分序列的分段形态模式并用特殊的字符进行表示,把时间序列转换成字符串序列,利用最长公共子序列方法计算字符串序列的距离作为时间序列之间的距离。最后通过实验验证该方法的有效性。理论分析和实验证明该方法对数据点的值不敏感,能够减少噪声的干扰,而且具有较高的准确性。
Feature representation and similarity measure of time series is an important foundation of time series data mining. Aiming at the problem that the existing sequence representation method is difficult to reflect the morphological change of the sequence, which leads to the inaccuracy of the similarity measurement results, a new similarity measurement algorithm based on morphological patterns is proposed. On the basis of piecewise linear representation,according to the sequence of slope changes in different periods,the algorithm divides the sequence into different morphological pattern and expresses them with special characters. The time sequence is converted into a sequence of strings. The longest common subsequence method is adopted to calculate the distance of string sequences as the distance between time series. Finally,the effectiveness of the proposed method was verified by experiments. Theoretical analysis and experiments show that the method is insensitive to the value of the data points,which can reduce the interference of noise and has high accuracy.