流数据上的连续查询,尤其连续聚类查询是流数据处理研究的难点。草图算法能够计算流数据上等值连接大小的高精度近似值,而直方图算法能够较精确地统计流数据的分布。本文结合这两种算法的优势,提出了一种能够高效处理流数据上复杂聚类查询的算法。理论和实验结果表明,该算法具有较高的精度和较小的空间复杂度。
Continuous queries over data streams, especially aggregation queries are one of the difficult problems in data stream processing. The algorithm based on the sketch estimates the equal join size of data stream with high precision and gives the distribution of data stream accurately. An efficient data stream aggregation query processing algorithm is proposed based on sketching and histograms techniques. The algorithm can provide approximate The theory and experimental results show that answers to a certain kind of complex aggregate queries.The theory and experimental results show that the algorithm has high precision and small space complexity.