由于不确定数据流在诸如移动计算、无线射频识别技术和传感器网络等实际应用中广泛存在,如何利用有限存储空间进行快速查询处理是不确定数据流管理的重要问题.本文研究基于滑动窗口模型的不确定数据流Top-K查询的问题,提出了相应的算法.该算法利用滑动窗口数据模型存储不确定流数据,建立3个概要表,当前窗口中的元组分别按照它们出现的顺序、它们的得分值的大小、它们的出现概率值的大小存入这3个表中.算法逐次在得分值最高的前若干个元组中选取概率值最高的前k项元组集合,并计算它们的发生概率.我们在理论上证明了,这些前k项元组集合中概率最高的就是Top-K查询结果.实验结果表明,所提出的查询算法在时间与空间复杂性方面优于其他类似的算法.
Due to the existence of uncertain data streams in wide spectrum of real-world applications,such as mobile computing, radio frequency identification technology and wireless sensor networks, uncertain data streams management has become an important problem in stream data mining. This paper tackles the problem of answering maximal probabilistic Top-K tuple set (MPTopKTS) queries on uncertain data streams based on a sliding-window model. We present an algorithm for processing sliding-window MPTopKTS queries on uncertain data streams. Based on the sliding-window model,we designed three synopses table to process each tuple which contains data item 3c, score item f(x) ,and existential probability p(x). The tuples are stored in the tables according to their arrival times, their scores, and their probabilities respectively. The algorithm selects the k tuples with the highest probabilities from the sets of different numbers of the tuples with the highest scores. After that, the algorithm computes existential probability of the Top-K tulpes,and chooses the one with the highest probability as the answer of MPTopKTS. We theoretically proved the correctnesss of the algorithm presented. Our experimental results show that our algorithm requires lower time and space complexity than other similar algorithms.