数据流潜在无限、流动迅速、变化频繁等特点,使在数据流上实现隐私保护面临重大挑战.在阐述数据流匿名的概念及分析现有数据流匿名算法特点的基础上,提出基于聚类的数据流匿名设计思想,并给出算法实现.在真实数据集上的实验结果表明,新算法在满足匿名要求的同时能够降低概化和抑制处理带来的信息损失.
Data streams have the features of potential infinity, fast flowing and frequent variation, which makes the privacy preservation of data streams a great challenge. The conceptions of data stream anonymization and the analysis of the existing algorithms for data stream anonymization are provided. The ideal of anonymizing data stream through clustering and the algorithm implementation are proposed. The experiments conducted on the real data set demonstrate the new method can reduce the information loss caused by generalization and suppression while satisfying the anonymization requirements.