东篱科研大数据发现系统（DRDS）

位置：成果数据库 > 期刊 > 期刊详情页

基于小波概要的并行数据流聚类

期刊名称：软件学报
时间：0
页码：644-658
语言：中文
分类：TP311[自动化与计算机技术—计算机软件与理论;自动化与计算机技术—计算机科学与技术]
作者机构：[1]复旦大学计算机科学技术学院,上海200433, [2]宁波大学信息科学与工程学院,浙江宁波315211
相关基金：Supported by the National Natural Science Foundation of China under Grant Nos.60803021, 60973047 （国家自然科学基金）; the Zhejiang Provincial Natural Science Foundation of China under Grant No.Y1091189 （浙江省自然科学基金）; the Ningbo Municipal Natural Science Foundation of China under Grant Nos.2007A610007, 2009A610072 （宁波市自然科学基金）
相关项目：多核数据流连接处理器及相关算法研究

关键词：聚类, 概要, 遗忘特性, 离散小波变换, 数据流, clustering, synopsis, amnesic feature, discrete wavelet transform, data stream

中文摘要：

许多应用中会连续不断产生大量随时间演变的序列型数据，构成时间序列数据流，如传感器网络、实时股票行情、网络及通信监控等场合．聚类是分析这类并行多数据流的一种有力工具．但数据流长度无限、随时间演变和大数据量的特点，使得传统的聚类方法无法直接应用，利用数据流的遗忘特性，应用离散小波变换，分层、动态地维护每个数据流的概要结构．基于该概要结构，快速计算数据流与聚类中心之间的近似距离，实现了一种适合并行多数据流的K-means聚类方法．所进行的实验验证了该聚类方法的有效性．

英文摘要：

In many real-life applications, such as stock markets, network monitoring, and sensor networks, data are modeled as dynamic evolving time series which is continuous and unbounded in nature, and many such data streams concur usually. Clustering is useful in analyzing such paralleled data streams. This paper is interested in grouping these evolving data streams. For this purpose, a synopsis is maintained dynamically for each data stream. The construction of the synopsis is based on Discrete Wavelet Transform and utilizes the amnesic feature of data stream. By using the synopsis, a fast computation of approximate distances between streams and the cluster center can be implemented, and an efficient online version of the classical K-means clustering algorithm is developed. Experiments have proved the effectiveness of the proposed method.

同期刊论文项目