基于隐私保护的数据挖掘是信息安全和知识发现相结合的产物.提出一种基于隐私保护的序列模式挖掘算法PP-SPM.算法以修改原始数据库中的敏感数据来降低受限序列模式的支持度为原则,首先构建SPAM序列树,根据一定的启发式规则,从中获得敏感序列,再进一步在原始数据库中找到敏感数据,对其做布尔操作,实现数据库的清洗.实验表明,该算法在完全保护隐私的情况下,对于D6C10T2.5S414数据集,当修改3.5%的原始数据后,其序列模式丢失率为2%.
Data mining based on privacy preserving is the combination of information security technology and knowledge discovery technology. An algorithm of mining sequential pattern based on privacy preserving (PP-SPM) is proposed in this paper. It focuses on minimizing the support of restrictive sequential patterns by modifying the sensitive data of the original database. The tree called SPAM is built and the sensitive sequences are found from the tree firstly. Then the sensitive data is chosen from the original database and removed from the transactions by heuristic rules. The algorithm of PP-SPM not only completely protects the privacy but also hardly affects the sequential pattern mining. The experimental results show that the loss ratio of sequential patterns is 2% for the dataset of D6C10T2.5S4I4 after 3.5% data being modified.