针对数据流中的匿名问题,提出一种基于时间密度的数据流匿名算法,考虑数据流的强时态性,提出时间权重和时间密度概念,当已发布簇的个数达到上限时,删除时间密度最小的簇,以此来保证已发布簇的可重用性。此外,为了保持较高的执行效率,算法对数据采用单遍扫描,以实现数据流的高效匿名。在真实数据集上的实验结果表明,提出的方法能保持较高的效率和较好的数据效用。
Aim to address the problem of anonymization on data streams, an anonymization algorithm based on time density for data stream was proposed. Time weight and time density were designed for describing the data stream's temporal, when the published clusters reach the threshold, it will delete the minimum time density cluster to ensure the availability of published clusters. Furthermore, in order to maintain the higher efficiency, the algorithm scans the data only once to satisfy the anonymization requirements for speeding up. The experimental results on the real dataset show that the algo- rithm is efficient and effective meanwhile the quality of the output data.