数据流的连接常作为数据流查询操作的支撑算法.以往算法多考虑的是周期性演化的数据流,对于非周期性数据流连接涉及较少.提出一种变换高斯分布下的数据流连接算法.通过采样统计确定当前高斯中心点,并以此为中心划分数据块.提出在变换高斯分布下的确定数据连接块的方法.实验表明本算法与同类算法相比可以在有限的内存下产生更高的连接率,更小的I/O代价.
Data Stream join often is a data stream support algorithm for query.The previous algorithms consider the cyclical evolution data stream more than the non-cyclical evolution data stream.A transformation under the Gaussian distribution of data stream join algorithm is proposed.Statistical sampling is applied to determine the current Gaussian center point and to partition the data blocks.The way to fix the Gaussian distribution block is proposed to determine the data join.Experiments show that under the smaller memory constraints this algorithm performs better than similar algorithms,as well as the smaller I/O costs.