多维时间序列是信息系统中一类重要的数据对象,相似搜索是其应用的一个核心,两个序列(子序列)相似度加以比较的常用方法是:将序列(子序列)转换成空间中的曲线,然后计算曲线间的欧几里德距离.这种方法的主要缺陷是它仅考虑了序列(子序列)间的整体距离关系,而不能体现它们自身的局部变化.针对此问题,提出了一种新的可应用于多维时间序列的快速相似搜索方法.该方法将序列(子序列)的局部变化特性与检索结构(k-d树)结合起来,使得在搜索k-d树的同时实现了序列(子序列)的局部变化匹配,从而极大地提高了查询效率和正确率.实验结果表明了算法的有效性。
Multidimensional time sequences are an important kind of data stored in the information system. Similarity search is the core of their applications. Usually, these sequences are viewed as curves in multi-space, and the Euclidean Distance is computed to measure similarity between these curves. Although Euclidean Distance can reflect the whole deviation between two sequences or subsequences, it ignores their inherent changing features. To remedy it, this paper presents a new algorithm. In this algorithm, the shape features of sequences or subsequences are subtly combined with spatial index structure (k-d tree), which makes it possible to match shape of sequences or subsequences without any extra cost whiling searching the tree. The experimental result demonstrates that the algorithm is effective and efficient.